/SNP_Wrap_Python

Facilitates post-processing of SNP pipeline outputs

Primary LanguagePython

SNP_Wrap_Python

Facilitates post-processing of SNP pipeline outputs

The best-developed script in this package is intersectTwo_colTabSNPs.py, which reads five-column or six-column summaries of vcf files. The option, '--outputType T' produces a five-column .tsv as output while the option '--union' computes union of SNPs positions instead of intersection.

For example:

python intersectTwo_colTabSNPs.py sample-1.six-col.tsv sample-2.six-col.tsv --outputType T

Where sample-1.six-col.tsv and sample-2.six-col.tsv would have been generated by the commands:

perl -e 'print("CHROM\tPOS\tREF\tALT\tINFO-DP\tINFO-TYPE\n")' > sample-1.six-col.tsv

and

bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%INFO/DP\t%INFO/TYPE\n' sample-1.vcf.gz >> sample-1.six-col.tsv

The script, intersectAndComplement_colTabSNPs.py, is similar to the first, but takes seven-column .tsv files and offers the additional options of displaying input file SNPs positions which are not found in the intersection.

intersectAndComplement_colTabSNPs.py sample-1.seven-col.tsv sample-2.seven-col.tsv --outputType D

Here, 'D' indicates differences between input files, sample-1 and sample2, versus their intersection.

Seven-column .tsv files are constructed by the following commands:

perl -e 'print("CHROM\tPOS\tREF\tALT\tQUAL\tINFO-DP\tCONSENS\n")' > sample_1.seven-col.tsv

and

bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%QUAL\t%INFO/AO\t%INFO/DP\n' sample_1.vcf.gz | perl -ne '@F=split(/\s+/, $_); printf "%\s\t%\d\t%\s\t%\s\t%\d\t%\d\t%0.4f\n", $F[0], $F[1], $F[2], $F[3], $F[4], $F[6], $F[5]/$F[6]' >> sample_1.seven-col.tsv