Course of work with molecular data in R 2020 in České Budějovice

Submitted by vojta on Fri, 03/06/2020 - 22:39

R is nowadays probably the most powerful tool for calculations of all kinds. There are plenty of modules available for work with molecular data. Those will be introduced during the course. The course will be taught from October 19th to Friday 23rd (see below). The course will be exclusively on-line, there will be no personal meeting.

We'll start every day at 9:00 and finish between 16:00 and 17:00. The course will be combination of shorter talks (they will be broadcasted using MS Teams) followed by time for Your work, and for questions and consulting of all Your issues. It'll be hard, but we'll make it. :-) Please, be ready in time at Monday as we'll surely need some time to ensure the conference is working well and we are able to talk to each other.

The course contains theory of used methods, tutorials with test data, tasks for individual work of participants, and more.

Previous knowledge of R is useful, but not necessary. At least basic knowledge of molecular biology is required, previous knowledge about any methods how to analyse DNA data is recommended.

List of topics (might be edited according to wishes of participants, speed, etc.):

  • Basic work in R - how to enter commands, install packages, read help, types of variables, indexes, etc.
  • Bioconductor
  • Load and export molecular data of various types and formats.
  • Download molecular data from on-line databases
  • Extractions of SNP from sequencing data
  • Extraction of polymorphism from sequences
  • Mikrosatellites, AFLP, SNP, sequences, ...
  • Manipulations with data, conversions among formats
  • Distance matrices, import of custom matrices
  • Export of data
  • Basic statistics
  • PCoA
  • Phylogenetic trees (NJ, UPGMA, ML) and display and test
  • MSN
  • Basic statistics, genetic indices heterozygosity, HWE, F-statistics
  • DAPC
  • Whole genome SNP data
  • Spatial analysis - Mantel test, Moran’s I, Monmonier, sPCA, ...
  • Basic map creation
  • Structure
  • Alignments
  • Manipulations with trees, work with big sets of trees
  • Phylogenetic independent contrast
  • Phylogenetic autocorrelation
  • Phylogenetic PCA
  • Ancestral state reconstruction
  • Additional extending topics

There will be space available during the last day for another special questions of participants, exams and consultation of participants' own data (facultative).

Requirements prior the course

  • Don't be afraid of R. :-)
  • Previous knowledge of R is useful, but not necessary. At least basic knowledge of molecular biology is required, previous knowledge about any methods how to analyse DNA data is recommended. I recommend to follow courses like R for life - MB120P147E, Use of molecular markers in plant systematics and population biology - MB120P44 (optionally also with practical lessons I and II), and Plant population genetics - MB120P145, or anything similar, prior to this course.
  • For course you need
    • Own computer to work.
    • Working Wi-Fi. Eduroam (set it up using faculty or recommended general instructions) or in application form You can ask for temporary password.
    • Installed R. I also recommend to install some graphical user interface like RStudio, RKWard, R commander or some similar according to your choice.
    • If you have experience with R, you can save some work by installing required R packages in advance. I'll send instruction prior to the course.
  • The course will be taught 5 days, with plenty of time for all sorts of consultations and solving issues.

Information for the course

The course will start Monday October 19 on-line on MS Teams as we will surely need some time to ensure everyone is well connected. Please, be ready in time. We'll end up between 4 and 5 PM and there will be enough breaks for snacks and lunch.

For the course You need only notebook with working WiFi (participants without eduroam access will get temporal password) and installed R. Install R 4.0 as some packages will not work in older R versions. I also recommend to install some graphical interface for R for more convenient work like RStudio or RKWard.

If You can, install also, please, following R packages: BiocManager, PBSmapping, RgoogleMaps, Rmpi, StAMPP, TeachingDemos, ade4, adegenet, adegraphics, adephylo, akima, ape, caper, corrplot, devtools, gee, geiger, ggplot2, gplots, hierfstat, ips, lattice, mapdata, mapplots, mapproj, maps, maptools, nlme, pegas, phangorn, philentropy, phylobase, phytools, picante, plotrix, poppr, raster, rgdal, rworldmap, rworldxtra, seqinr, shapefiles, snow, sos, sp, spdep, splancs, tripack, vcfR, vegan. If You do it before the course, You save some time and network bandwidth. ;-)

Apart of R You'll also need extra software outside R. Install please ClustalW and/or ClustalX (not Omega), MAFFT and MUSCLE. To edit graphical outputs I also recommend some graphical software like GIMP and Inkscape (similar to products of Adobe and Corel).

I might add some more R packages and/or non-R software as I'm updating the course.

You can use Linux, macOS or Windows. There are few tasks, which do not work very well on macOS or Windows. You can easily skip them, they are not crucial. If You would like to try to work in Linux (which might be advantageous for analysis of genetic data in general), install VirtualBox and then download Linux installation image (6.4 GB) prepared for the course. It'll require up to 20 GB on the disk. Start VirtualBox, go to menu "File" and select "Import Appliance..." and load the prepared image. On some computers, successful starting of the appliance might require some changes of settings according to Your CPU, but this we can easily solve at Monday morning. VirtualBox will not perform well if You have 4 GB of memory (RAM) or less, and CPU without virtualization support (the best are Intel i5 and i7 and modern AMD CPUs). It also requires 64bit Windows (or any other hosting system, no 32bit).

The Linux installed is openSUSE Leap 15.1. But if You would feel uncomfortable when working in Linux, rather stay with Windows or macOS.

For students, requirements for the exam are:

  1. Active participation.
  2. Asking and answering on-topic questions during the course.
  3. Loading of any molecular data into R (own, exemplary data from some R package, or from on-line database) and doing several appropriate analysis (according to data type), it is possible to use Internet, documentation, etc.
  4. Write to Wikipedia at least one page about any topic related to the course. It can be translation, edition of an existing page, it can be splitted into several articles, etc. Student should use native Wikipedia according to her/his language (so preferably not English).

At least the tasks 1-3 can be easily solved by Friday. ;-) Of course, it is possible to send Your analysis and/or link to Wikipedia page any time (by the end of semester).

If You need confirmation of attendance, let me know in advance, I'll prepare it. This doesn't apply for students as they'll have it written as attachment of their diploma.

Before the course, fill, please, short questionnaire form so that I get some overview about Your experience and expectations.

Shortly before the course I'll send You presentation and scripts to be used during the course, and possibly any other last-minute updates. All information will be also on this page.

Attachment Size
r_mol_data_phylogen_4.pdf 12.47 MB