Pipeline for analysis of Puccinia striiformis f.sp. tritici genomic reads
Before installing, ensure the licences of miniconda and software dependencies in env.yml
are compatible with your usage.
The installer:
- Installs conda (miniconda)
- Installs dependencies from
eny.yml
into the environmentmarple-pst
- Runs tests with
test/run_tests.sh
./install/install.sh
For WSL, you should keep the marple-pst
directory in the Linux file sytem (e.g. ~/marple-pst
), and not the windows file system (e.g. /mnt/c/Users/me/Documents/marple-pst
) as you may get an error:
OSError: [Errno 40] Too many levels of symbolic links
Let's go through the pipeline using a fictional example.
In this example, 2 samples have already been sequenced (using nanpore sequencing) and basecalled using Guppy.
Our starting point is a directory for each sample containing the fastq files created by Guppy.
sample name | reads directory |
---|---|
Norfolk-1 | example/fastq/barcode01 |
Warrior_10 | example/fastq/barcode02 |
-
Reset the example
./reset_example.sh
-
Navigate to the example directory and make a new directory for each sample.
cd example mkdir -p Norfolk-1 Warrior_10
-
We need a single fastq for each sample, so combine all the fastq files for barcode01 into one. Do the same for barcode02.
cat fastq/barcode01/*.fastq > Norfolk-1/Norfolk-1.fastq cat fastq/barcode02/*.fastq > Warrior_10/Warrior_10.fastq
-
Align the reads for each sample to the reference genes, extract the exons, and create a report
../src/reads_to_exons_concat.sh Norfolk-1/Norfolk-1.fastq Warrior_10/Warrior_10.fastq
-
Inspect the report by opening
example/multiqc_report.html
in your browser -
Use the extracted and concatenated exons of the new samples to create a tree
../src/exons_concat_to_tree_imgs.sh \ --start ../data/8_isolates_388_genes_exons.fasta.gz \ --out_dir tree \ --name 2_new_samples \ */*_exons_concat.fasta
In this example we use a starting tree input containing only 8 isolates to save time.
When you run this with real samples, use
../data/57_isolates_388_genes_exons.fasta.gz
which contains 57 isolates. -
Inspect the tree by opening
example/tree/2_new_samples_country.pdf
in your browser.Notice how country is shown as '?' for the new samples. We'll fix that in the next step.
-
Copy the metadata spreadsheet
data/metadata_264_isolates.xlsx
and rename it so that you haveexample\metadata_264_isolates_2_new_samples.xlsx
.Add the new sample metadata to the spreadsheet so that the top 3 rows look like this:
tree_name tree_new_name country nextstrain_group region author other_names genetic_group Norfolk-1 Norfolk-1 UK Europe Warrior_10 Warrior_10 UK Europe -
Use this metadata to make a new visualisation of the tree
../src/tree_to_imgs.sh \ --meta metadata_264_isolates_2_new_samples.xlsx \ --out_dir tree_with_new_metadata \ tree/RAxML_bestTree.2_new_samples.newick
-
Inspect the new visualisation of the tree by opening
example/tree_with_new_metadata/2_new_samples_country.pdf
in your browser.The country of our 2 new samples is now showing correctly.
If you get stuck and want to see what the output should look like, then run this from the marple-pst directory:
./run_example.sh
See pipeline.md for a flowchart of the pipeline.
Align reads to reference genes then extract and concatenate exons
reads_to_exons_concat.sh
--ref Reference FASTA file to align reads to.
--gff Annotation file giving the position of exons within the
reference genes. Genes should be specificied as if
everything was on the + strand.
--multiqc_config MultiQC config file for creating the report.
--threads Number of threads to use.
--trim Should FASTQ files be trimmed (yes/no).
fastq1 ... fastqn Read files for aligning to the reference genes.
Output for each read will be created in the same
directory as the fastq file.
Create and visualise a tree with new samples
exons_concat_to_tree_imgs.sh
--meta Path to spreadsheet containing isolate metadata and styles.
--start Starting tree input, i.e. concatenated exons file for
previously sequenced samples.
--name Name to use as a prefix for all created tree files.
--out_dir Directory to create tree files in.
--threads Number of threads to use for RAxML.
--img_fmt Format to output tree images as.
exon_concat_paths Concatenated exons file for new samples to add to the tree.
Visualise a tree
tree_to_imgs.sh
--meta Path to spreadsheet containing isolate metadata and styles.
--out_dir Directory to create tree files in.
--img_fmt Format to output tree images as.
newick_path Tree in Newick format.
With all of the above you can use:
-h Show a help message and exit.
--version Show the marple-pst version number and exit.
Run all the tests (roughly 2 minutes):
./test/run_tests.sh
Run the tests and report code coverage:
./test/run_tests.sh coverage
Skip the end to end test and integration tests
./test/run_tests.sh skip_integration skip_end_to_end
./install/uninstall.sh
- Deletes the
marple-pst
environment and its packages - Deletes
marple_pst_miniconda.sh
andmarple_pst_miniconda