Handling of chrX data with dumpSTR
mikmaksi opened this issue · 7 comments
Hello,
I encountered the following error when running DumpSTR on a vcf produced by GangSTR that had only chrX calls for 3 samples. Two of the samples are female and have 2 comma separated values in their REPCN field, while 1 sample is male and has a single value in its REPCN field
Traceback (most recent call last):
File "/usr/local/bin/dumpSTR", line 11, in <module>
load_entry_point('trtools==2.0.4', 'console_scripts', 'dumpSTR')()
File "/usr/local/lib/python3.6/site-packages/trtools-2.0.4-py3.6.egg/dumpSTR/dumpSTR.py", line 941, in run
retcode = main(args)
File "/usr/local/lib/python3.6/site-packages/trtools-2.0.4-py3.6.egg/dumpSTR/dumpSTR.py", line 895, in main
record = ApplyCallFilters(record, invcf, call_filters, sample_info)
File "/usr/local/lib/python3.6/site-packages/trtools-2.0.4-py3.6.egg/dumpSTR/dumpSTR.py", line 597, in ApplyCallFilters
filter_reasons = FilterCall(sample, call_filters)
File "/usr/local/lib/python3.6/site-packages/trtools-2.0.4-py3.6.egg/dumpSTR/dumpSTR.py", line 544, in FilterCall
if cfilt(sample) is not None: reasons.append(cfilt.GetReason())
File "/usr/local/lib/python3.6/site-packages/trtools-2.0.4-py3.6.egg/dumpSTR/filters.py", line 605, in __call__
ml = [int(item) for item in sample["REPCN"]]
TypeError: 'int' object is not iterable
Thanks so much!
I need some more information to help you:
- Please provide the command you used to call dumpSTR (e.g. all the arguments)
- Did you do this using the source code, or installing it from somewhere (conda? pip?)
- Where on Snorlax the sample files are so that I can try this myself?
Of course
- I moved a sample data and launch script to here
/storage/mikhail/062620_dumpSTR_chrX_tshoot
1_filter_with_dumpSTR.sh
: launcher scriptdata/raw/chrX.vcf
: input data
- The command used to run dumpSTR is in
1_filter_with_dumpSTR.sh
- I used the default TRTools installation available on snorlax
I routinely get the same dumpSTR error message, using gangSTR v2.4.6 and dumpSTR v3.0.2 (as well as with earlier versions) to test 37 known pathogenic STR loci, 5 of which are on chrX. If I remove the chrX variants from the gangSTR output before providing it to dumpSTR, I do not get the error. I would also vote for handling of chrX by dumpSTR.
Now trying on gangSTR v2.5.0 and dumpSTR v4.0.0, running on 3 samples with 2 males and 1 female, and where gangSTR output includes calls on chrX, I get the following error from dumpSTR:
Traceback (most recent call last):
File "~/bin/dumpSTR", line 33, in <module>
sys.exit(load_entry_point('trtools==4.0.0', 'console_scripts', 'dumpSTR')())
File "~/lib/python3.6/site-packages/trtools-4.0.0-py3.6.egg/trtools/dumpSTR/dumpSTR.py", line 1245, in run
retcode = main(args)
File "~/lib/python3.6/site-packages/trtools-4.0.0-py3.6.egg/trtools/dumpSTR/dumpSTR.py", line 1183, in main
record = ApplyCallFilters(record, call_filters, sample_info, invcf.samples)
File "~/lib/python3.6/site-packages/trtools-4.0.0-py3.6.egg/trtools/dumpSTR/dumpSTR.py", line 569, in ApplyCallFilters
filt_output = filt(record)
File "~/lib/python3.6/site-packages/trtools-4.0.0-py3.6.egg/trtools/dumpSTR/filters.py", line 706, in __call__
ci = np.stack(ci)
File "<__array_function__ internals>", line 6, in stack
File "~/lib/python3.6/site-packages/numpy-1.18.1-py3.6-linux-x86_64.egg/numpy/core/shape_base.py", line 426, in stack
raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape
If I delete the chrX variants from the GangSTR output, then DumpSTR runs fine.
Incidentally, Fragile X Syndrome and Kennedy disease are 2 common X-linked repeat expansion disorders, so it would be nice to be able to find these by gangSTR & dumpSTR.
I am getting a similar error with chrX. If i remove chrX from the vcf file, dumpSTR runs smoothly. if i include, i get this error.
Traceback (most recent call last): File "/project/jcreminslab/kenrc_projects/softwares.2/TRTools/venv.1/bin/dumpSTR", line 8, in <module> sys.exit(run()) File "/project/jcreminslab/kenrc_projects/softwares.2/TRTools/venv.1/lib/python3.6/site-packages/trtools/dumpSTR/dumpSTR.py", line 1245, in run retcode = main(args) File "/project/jcreminslab/kenrc_projects/softwares.2/TRTools/venv.1/lib/python3.6/site-packages/trtools/dumpSTR/dumpSTR.py", line 1204, in main record.vcfrecord.INFO['HWEP'] = utils.GetHardyWeinbergBinomialTest(allele_freqs, genotype_counts) File "/project/jcreminslab/kenrc_projects/softwares.2/TRTools/venv.1/lib/python3.6/site-packages/trtools/utils/utils.py", line 312, in GetHardyWeinbergBinomialTest if gt[1] not in allele_freqs.keys(): IndexError: tuple index out of range
i use these commands:
GangSTR --bam input.bam --ref hg38.fa --regions hg38_ver13.bed --out input --bam-samps input --samp-sex M
dumpSTR --vcf input.vcf --out out.5 --gangstr-min-call-DP 5 --gangstr-filter-spanbound-only --gangstr-filter-badCI --gangstr-max-call-DP 1000 --gangstr-min-call-Q 0.6
I'm having the same error as @rckeerthivasan when having variants on chrX on males
My settings:
ganstr_command = "GangSTR"\
+ " --bam " + bamfile + " " + "--ref " + ref \
+ " --regions regions/my_panel.tsv"\
+ " --samp-sex " + patient_sex\
+ " --bam-samps " + patient\
+ " --out " + output_dir + "/" + file_name \
+ " --output-readinfo"\
+ " --nonuniform"\
+ " --include-ggl"
dumpstr_command = "dumpSTR" \
+ " --vcf " + gzvcf_name\
+ " --out " + filtered_dir + "/" + file_name\
+ " --gangstr-min-call-DP 20"\
+ " --gangstr-max-call-DP 1000"\
+ " --gangstr-filter-spanbound-only"\
+ " --gangstr-filter-badCI"\
+ " --zip"\
+ " --drop-filtered"
The error:
Traceback (most recent call last):
File "/home/oxana/.local/bin/dumpSTR", line 8, in <module>
sys.exit(run())
File "/home/oxana/.local/lib/python3.8/site-packages/trtools/dumpSTR/dumpSTR.py", line 1245, in run
retcode = main(args)
File "/home/oxana/.local/lib/python3.8/site-packages/trtools/dumpSTR/dumpSTR.py", line 1204, in main
record.vcfrecord.INFO['HWEP'] = utils.GetHardyWeinbergBinomialTest(allele_freqs, genotype_counts)
File "/home/oxana/.local/lib/python3.8/site-packages/trtools/utils/utils.py", line 312, in GetHardyWeinbergBinomialTest
if gt[1] not in allele_freqs.keys():
IndexError: tuple index out of range
Do you have any ideas for a workaround before it's fixed?
Running it on a tumor sample, so high variation is probable, including copy numbers and ploidy
Only workaround I have is to delete chrX variants from the gangSTR output before passing to dumpSTR. It's not ideal considering there are several X-linked repeat expansion disorders, but at least it allows dumpSTR to run on the rest of the genome, and also verifies that this is the correct problem. Interesting that we're all getting different errors:
TypeError: 'int' object is not iterable
ValueError: all input arrays must have the same shape
IndexError: tuple index out of range