Set of scripts to run STRUCTURE in parallel on computing grids like MetaCentrum. Scripts are designed for grids and clusters using PBS Pro, but can be easily adopted for another queue system.
Homepage and reporting issues
See https://github.com/V-Z/structure-multi-pbspro, ask about usage or so at https://github.com/V-Z/structure-multi-pbspro/discussions and report any issues or wishes using https://github.com/V-Z/structure-multi-pbspro/issues.
License
GNU General Public License 3.0, see https://www.gnu.org/licenses/gpl-3.0.html.
About STRUCTURE and its parallelization
STRUCTURE itself process single file in time. It has simple Java GUI available to create batch task and run on desktop, or also possibly on MetaCentrum. Other option in ParallelStructure R package (see my older example and slides), but it has problems with some input file formats. It runs on single computer, using multiple cores. Provided scripts distribute individual runs of STRUCTURE among multiple computers in computing cluster/grid, which speeds up everything a lot.
Requirements to use the scripts
The scripts are written for Linux servers. They might be running on another UNIX systems. Apart of BASH, the only requirement is STRUCTURE. It is already installed on MetaCentrum, so that user can simply load the module. If using own installation of STRUCTURE, either comment out or update respective line in script structure_multi_2_qsub.sh
. If you are unsure how to work in Linux command line on computing cluster, consult e.g. my slides or MetaCentrum wiki.
Postprocessing of the results
For next step collect all res.k.X.rep.Y.out_f
files in the output directory. Select the best K using e.g. Structure_sum R script (see my older example and slides) or Structure Harvester. Align and reorder the results with CLUMPP and draw final plots by e.g. distruct. See also my older complete example.