Pathseq2taxsummary is a Perl script to convert a slightly modified and concatenated PathSeq (http://software.broadinstitute.org/pathseq/) scores.txt file to a MOTHUR style tax.summary file (https://mothur.org/wiki/summary.tax/), which can then be used to make various plots and to compute various statistics using R or other software packages. Another useful feature of this script is that it enables combined analysis of multiple samples while the PathSeq scores.txt file only includes read mapping results from a single sample.
Included in this distribution are input and output files that were used in Lang et. al. and Jian et. al. (see Citations).
To run pathseq2taxsummary example datasets, cd into the desired directory (e.g., Lang_dir or Jian_dir) and run the following command(s):
- Concatenate all score.txt files for each sample in the dataset (appending "{sample_prefix}/scores.txt" at the end of each line using awk) [in bash shell]
- for i in $(cat sample_prefixes.lst); do awk -v OFS='\t' '{print $0,FILENAME}' $i/scores.txt >> combined_scores.txt; done;
- Make the .taxsummary file for PathSeq "ambiguous" mapped read counts (see https://gatkforums.broadinstitute.org/gatk/discussion/10913/how-to-run-the-pathseq-pipeline for definitions)
- ../../pathseq2taxsummery.pl -s combined_scores.txt > combined_scores_taxsummary_amb.txt
- Make the .taxsummary file for PathSeq "unambiguous" mapped read counts (see https://gatkforums.broadinstitute.org/gatk/discussion/10913/how-to-run-the-pathseq-pipeline for definitions)
- ../../pathseq2taxsummery.pl -s combined_scores.txt -u > combined_scores_taxsummary_unamb.txt
- Filter out specific taxa if desired. In this example, we pulled only "Viruses"
-
head -1 combined_scores_taxsummary_amb.txt > combined_virus_scores_taxsummary_amb.txt
-
grep "Viruses" combined_scores_taxsummary_amb.txt >> combined_virus_scores_taxsummary_amb.txt
-
head -1 combined_scores_taxsummary_unamb.txt > combined_virus_scores_taxsummary_unamb.txt
-
grep "Viruses" combined_scores_taxsummary_unamb.txt >> combined_virus_scores_taxsummary_unamb.txt
pathseq2taxsummary.pl -s [options]
pathseq2taxsummary.pl [options]
Options:
-s : [D]ebug
-u : display the last [i]nvocation to the user.
-h : fr[A]gment file [REQUIRED]
: (<Level><\t><Fragment_id><\t><Left_end><\t><Right_end><\t><Left_margin>)
Two example datasets from Lang et. al. and Jiang et. al.
A PERL script that generates a MOTHUR style tax.summary file (https://mothur.org/wiki/summary.tax/) from a slightly modified and concatenated PathSeq (http://software.broadinstitute.org/pathseq/) scores.txt file, which can then be used to make various plots and to compute various statistics using R or other software packages.
A directory containing 3 .tar.bz archives of two directories of example data for generating tax.summary files from Lang et. al. and Jiang et. al.
To unarchive Lang_dir.tar.bz (in the examples_dir):
- cd examples_dir
- tar -xvjf Lang_dir.tar.bz
To unarchive and combined the 2 Jiang et. al. archives (split due to size restrictions) into a single Jiang_dir folder:
- cd examples_dir
- mkdir Jiang_dir
- tar -xvjf Jiang_dir_a.tar.bz -C Jiang_dir --strip-components=1
- tar -xvjf Jiang_dir_b.tar.bz -C Jiang_dir --strip-components=1
The following is a list of required Perl modules used by LinearDisplay.pl:
Getopt::Std : Included in Perl 5 distribution
Cwd : Included in Perl 5 distribution
The following publications used this program to generate linear illustrations of bacteriophage genomes and should be used to site this program:The following publications used this program to convert a slightly modified and concatenated PathSeq (http://software.broadinstitute.org/pathseq/) scores.txt file to a MOTHUR style tax.summary file (https://mothur.org/wiki/summary.tax/), which can then be used to make various plots and to compute various statistics using R or other software packages.:
Lang S, Demir M, Martin A, Jiang L, Zhang X, Duan Y, Gao B, Wisplinghoff H, Kasper P, Roderburg C, Tacke F, Steffen HM, Goeser T, Abraldes JG, Tu XM, Loomba R, Starkel P, Pride D, Fouts DE, Schnabl B. Intestinal Virome Signature Associated With Severity of Nonalcoholic Fatty Liver Disease. Gastroenterology. 2020;159(5):1839-52. Epub 2020/07/12. PubMed PMID: 32652145.
Jiang L, Lang S, Duan Y, Zhang X, Gao B, Chopyk J, Schwanemann LK, Ventura-Cots M, Bataller R, Bosques-Padilla F, Verna EC, Abraldes JG, Brown RS, Jr., Vargas V, Altamirano J, Caballeria J, Shawcross DL, Ho SB, Louvet A, Lucey MR, Mathurin P, Garcia-Tsao G, Kisseleva T, Brenner DA, Tu XM, Starkel P, Pride D, Fouts DE, Schnabl B. Intestinal virome in patients with alcoholic hepatitis. Hepatology. 2020. Epub 2020/07/13. PubMed PMID: 32654263.
Derick E. Fouts dfouts@jcvi.org