Course of work with molecular data in R 2020 in České Budějovice

Submitted by vojta on Fri, 03/06/2020 - 22:39

R is nowadays probably the most powerful tool for calculations of all kinds. There are plenty of modules available for work with molecular data. Those will be introduced during the course. The course will be taught from October 19th to Friday 23rd (see below). Details will be updated by the end of September.

The course contains theory of used methods, tutorials with test data, tasks for individual work of participants, and more.

Previous knowledge of R is useful, but not necessary. At least basic knowledge of molecular biology is required, previous knowledge about any methods how to analyse DNA data is recommended.

List of topics (might be edited according to wishes of participants, speed, etc.):

  • Basic work in R - how to enter commands, install packages, read help, types of variables, indexes, etc.
  • Bioconductor
  • Load and export molecular data of various types and formats.
  • Download molecular data from on-line databases
  • Extractions of SNP from sequencing data
  • Extraction of polymorphism from sequences
  • Mikrosatellites, AFLP, SNP, sequences, ...
  • Manipulations with data, conversions among formats
  • Distance matrices, import of custom matrices
  • Export of data
  • Basic statistics
  • PCoA
  • Phylogenetic trees (NJ, UPGMA, ML) and display and test
  • MSN
  • Basic statistics, genetic indices heterozygosity, HWE, F-statistics
  • DAPC
  • Whole genome SNP data
  • Spatial analysis - Mantel test, Moran’s I, Monmonier, sPCA, ...
  • Basic map creation
  • Structure
  • Alignments
  • Manipulations with trees, work with big sets of trees
  • Phylogenetic independent contrast
  • Phylogenetic autocorrelation
  • Phylogenetic PCA
  • Ancestral state reconstruction
  • Additional extending topics

There will be space available during the last day for another special questions of participants, exams and consultation of participants' own data (facultative).

Requirements prior the course

  • Don't be afraid of R. :-)
  • Previous knowledge of R is useful, but not necessary. At least basic knowledge of molecular biology is required, previous knowledge about any methods how to analyse DNA data is recommended. I recommend to follow courses R for life - MB120P147E a Use of molecular markers in plant systematics and population biology - MB120P44 (optionally also with practical lessons I and II), and Plant population genetics - MB120P145, or anything similar, prior to this course.
  • For course you need
    • Own computer to work.
    • Working Wi-Fi. Eduroam (set it up using faculty or recommended general instructions) or in application form You can ask for temporary password.
    • Installed R. I also recommend to install some graphical user interface like RStudio, RKWard, R commander or some similar according to your choice.
    • If you have experience with R, you can save some work by installing required R packages in advance. I'll send instruction prior to the course.
  • The course will be taught 5 days, while 4 days are for teaching and last day is for exams and individual consultations. Course participants can stay this last day (which is recommended), but it is not conditional.

Information for the course

The course will start Monday October 19 in computer study room of the library, Branišovská 1646/31b at 9:00 AM. Please, arrive in time. We'll end up between 4 and 5 PM and there will be enough breaks for snacks and lunch. The course will last in the same way 4 days until Thursday 22nd. The room will be pretty crowded, at we'll have enough room to share experiences. :-)

Friday 23rd is not official teaching day, You don't have to join it, but it is open for doing exams (it's simple, don't worry:-) and for any consultation and/or discussion. We could also go deeper with some topics, if there would be particular interest. Please, let me know in advance if You plan to join Friday or not.

For the course You need only notebook with working WiFi (participants without eduroam access will get temporal password) and installed R. Install R 3.6 as some packages will not work in older R versions. I also recommend to install some graphical interface for R for more convenient work like RStudio or RKWard.

If You can, install also, please, following R packages: BiocManager, PBSmapping, RgoogleMaps, Rmpi, StAMPP, TeachingDemos, ade4, adegenet, adegraphics, adephylo, akima, ape, caper, corrplot, devtools, gee, geiger, ggplot2, gplots, hierfstat, ips, lattice, mapdata, mapplots, mapproj, maps, maptools, nlme, pegas, phangorn, philentropy, phylobase, phytools, picante, plotrix, poppr, raster, rgdal, rworldmap, rworldxtra, seqinr, shapefiles, snow, sos, sp, spdep, splancs, tripack, vcfR, vegan. If You do it before the course, You save some time and network bandwidth. ;-)

Apart of R You'll also need extra software outside R. Install please ClustlW and/or ClustalX (not Omega), MAFFT and MUSCLE. To edit graphical outputs I also recommend some graphical software like GIMP and Inkscape (similar to products of Adobe and Corel).

I might add some more R packages and/or non-R software as I'm updating the course.

You can use Linux, macOS or Windows. There are few tasks, which do not work very well on macOS or Windows. You can easily skip them, they are not crucial. If You would like to try to work in Linux (which might be advantageous for analysis of genetic data in general), install VirtualBox and then download Linux installation image (6.4 GB) prepared for the course. It'll require up to 20 GB on the disk. Start VirtualBox, go to menu "File" and select "Import Appliance..." and load the prepared image. On some computers, successful starting of the appliance might require some changes of settings according to Your CPU, but this we can easily solve at Monday morning. VirtualBox will not perform well if You have 4 GB of memory (RAM) or less, and CPU without virtualization support (the best are Intel i5 and i7 and modern AMD CPUs). It also requires 64bit Windows (or any other hosting system, no 32bit).

The Linux installed is openSUSE Leap 15.1. But if You would feel uncomfortable when working in Linux, rather stay with Windows or macOS.

As the course use to be joined be people from various institutions, it use to be interesting to meet one evening in pub to discuss whatever. We can shortly discuss at Monday who would be interested to join at Tuesday or Wednesday shortly after course. For students, joining or not-joining has absolutely no impact to the exam. :-)

For students, requirements for the exam are:

  1. Active participation.
  2. Asking and answering on-topic questions during the course.
  3. Loading of any molecular data into R (own, exemplary data from some R package, or from on-line database) and doing several appropriate analysis (according to data type), it is possible to use Internet, documentation, etc.
  4. Write to Wikipedia at least one page about any topic related to the course. It can be translation, edition of an existing page, it can be splitted into several articles, etc. Student should use native Wikipedia according to her/his language (so preferably not English).

At least the tasks 1-3 can be easily solved by Friday. ;-) Of course, it is possible to send Your analysis and/or link to Wikipedia page any time (by the end of semester).

If You need confirmation of attendance, let me know in advance, I'll prepare it. This doesn't apply for students as they'll have it written as attachment of their diploma.

Before the course, fill, please, short questionnaire form so that I get some overview about Your experience and expectations. I use the form also for my Linux course in Prague, so You can ignore any Linux-related questions here. :-)

Shortly before the course I'll send You presentation and scripts to be used during the course, and possibly any other last-minute updates. All information will be also on this page.