Course of work with molecular data in R 2023

Submitted by vojta on Wed, 10/12/2022 - 10:16

R is nowadays probably the most powerful tool for calculations of all kinds. There are plenty of modules available for work with molecular data. Their representative selection will be introduced during the course.

The course contains theory of used methods, tutorials with test data, tasks for individual work of participants, and more. The aim is to teach students how to analyze molecular data in R programming language, introduce available packages for their analysis and practical trying out to work with own or provided data.

Previous knowledge of R is useful, but not necessary. At least basic knowledge of molecular biology is required, previous knowledge about any methods how to analyse DNA data is recommended. The course is aiming primarily to Master and Ph.D. students, for Bachelor students only if they are highly advanced.

The course will be taught 5 days, while 4 days are for teaching and last day is for exams and individual consultations. Course participants can stay this last day (which is recommended), but it is not conditional.

If there is at least one participant not speaking Czech, the course will be in English.

Information are continuously updated in SIS. Schedules are also in SIS. The course will be taught in lecture hall OŽP B12 (1st mezzanine, Benátská 2, Prague 2) from February 6 to 10, 2023, from 9:00 AM to 4-5:00 PM (with enough breaks). I'd be glad if participants could fill a short questioner which will help me with preparation of the course and communication with participants.

The course will be combination of shorter talks followed by independent work of students, and room for questions and consultations, etc.

Depending on epidemiological situation the course can be in hybrid mode (not only in full attendance) or fully on-line only. Details will be updated according to the situation prior the course.


List of topics (might be edited according to wishes of participants, speed, etc.):

  • Basic work in R – how to enter commands, install packages, read help, types of variables, indexes, etc.
  • Bioconductor
  • Load and export molecular data of various types and formats.
  • Download molecular data from on-line databases
  • Extractions of SNP from sequencing data
  • Extraction of polymorphism from sequences
  • Mikrosatellites, AFLP, SNP, sequences, …
  • Manipulations with data, conversions among formats
  • Distance matrices, import of custom matrices
  • Export of data
  • Basic statistics
  • PCoA
  • Phylogenetic trees (NJ, UPGMA, ML) and display and test
  • MSN
  • Basic statistics, genetic indices heterozygosity, HWE, F-statistics
  • DAPC
  • Whole genome SNP data
  • Spatial analysis – Mantel test, Moran’s I, Monmonier, sPCA, …
  • Basic map creation
  • Alignments
  • Manipulations with trees, work with big sets of trees
  • Phylogenetic independent contrast
  • Phylogenetic autocorrelation
  • Phylogenetic PCA
  • Ancestral state reconstruction
  • Additional extending topics...

For course you need own computer to work and installed R. I also recommend to install some graphical user interface like RStudio, RKWard, R commander or some similar according to your choice. If you have experience with R, you can save some work by installing required R packages in advance. I'll send instruction prior to the course. If You do not wish to install everything, You can use prepared Linux image for VirtualBox containing everything needed.