STRUCTURE multi PBS Pro scripts

Submitted by vojta on Thu, 02/25/2021 - 17:24

Set of scripts to run STRUCTURE in parallel on computing grids like MetaCentrum. Scripts are designed for grids and clusters using PBS Pro, but can be easily adopted for another queue system.

Homepage and reporting issues

See https://github.com/V-Z/structure-multi-pbspro, ask about usage or so at https://github.com/V-Z/structure-multi-pbspro/discussions and report any issues or wishes using https://github.com/V-Z/structure-multi-pbspro/issues.

License

GNU General Public License 3.0, see https://www.gnu.org/licenses/gpl-3.0.html.

About STRUCTURE and its parallelization

STRUCTURE itself process single file in time. It has simple Java GUI available to create batch task and run on desktop, or also possibly on MetaCentrum. Other option in ParallelStructure R package (see my example and slides), but it has problems with some input file formats. It runs on single computer, using multiple cores. Provided scripts distribute individual runs of STRUCTURE among multiple computers in computing cluster/grid, which speeds up everything a lot.

Requirements to use the scripts

The scripts are written for Linux servers. They might be running on another UNIX systems. Apart of BASH, the only requirement is STRUCTURE. It is already installed on MetaCentrum, so that user can simply load the module. If using own installation of STRUCTURE, either comment out or update respective line in script structure_multi_2_qsub.sh. If you are unsure how to work in Linux command line on computing cluster, consult e.g. my slides or MetaCentrum wiki.

Postprocessing of the results

For next step collect all res.k.X.rep.Y.out_f files in the output directory. Select the best K using e.g. Structure_sum R script (see my example and slides) or Structure Harvester. Align and reorder the results with CLUMPP and draw final plots by e.g. distruct. See also my complete example.