gymrek-lab/TRTools

Handling of chrX data with dumpSTR

mikmaksi opened this issue · 7 comments

Hello,

I encountered the following error when running DumpSTR on a vcf produced by GangSTR that had only chrX calls for 3 samples. Two of the samples are female and have 2 comma separated values in their REPCN field, while 1 sample is male and has a single value in its REPCN field

Traceback (most recent call last):
  File "/usr/local/bin/dumpSTR", line 11, in <module>
    load_entry_point('trtools==2.0.4', 'console_scripts', 'dumpSTR')()
  File "/usr/local/lib/python3.6/site-packages/trtools-2.0.4-py3.6.egg/dumpSTR/dumpSTR.py", line 941, in run
    retcode = main(args)
  File "/usr/local/lib/python3.6/site-packages/trtools-2.0.4-py3.6.egg/dumpSTR/dumpSTR.py", line 895, in main
    record = ApplyCallFilters(record, invcf, call_filters, sample_info)
  File "/usr/local/lib/python3.6/site-packages/trtools-2.0.4-py3.6.egg/dumpSTR/dumpSTR.py", line 597, in ApplyCallFilters
    filter_reasons = FilterCall(sample, call_filters)
  File "/usr/local/lib/python3.6/site-packages/trtools-2.0.4-py3.6.egg/dumpSTR/dumpSTR.py", line 544, in FilterCall
    if cfilt(sample) is not None: reasons.append(cfilt.GetReason())
  File "/usr/local/lib/python3.6/site-packages/trtools-2.0.4-py3.6.egg/dumpSTR/filters.py", line 605, in __call__
    ml = [int(item) for item in sample["REPCN"]]
TypeError: 'int' object is not iterable

Thanks so much!

I need some more information to help you:

  • Please provide the command you used to call dumpSTR (e.g. all the arguments)
  • Did you do this using the source code, or installing it from somewhere (conda? pip?)
  • Where on Snorlax the sample files are so that I can try this myself?

Of course

  1. I moved a sample data and launch script to here /storage/mikhail/062620_dumpSTR_chrX_tshoot
  • 1_filter_with_dumpSTR.sh: launcher script
  • data/raw/chrX.vcf: input data
  1. The command used to run dumpSTR is in 1_filter_with_dumpSTR.sh
  2. I used the default TRTools installation available on snorlax

I routinely get the same dumpSTR error message, using gangSTR v2.4.6 and dumpSTR v3.0.2 (as well as with earlier versions) to test 37 known pathogenic STR loci, 5 of which are on chrX. If I remove the chrX variants from the gangSTR output before providing it to dumpSTR, I do not get the error. I would also vote for handling of chrX by dumpSTR.

Now trying on gangSTR v2.5.0 and dumpSTR v4.0.0, running on 3 samples with 2 males and 1 female, and where gangSTR output includes calls on chrX, I get the following error from dumpSTR:

Traceback (most recent call last):
  File "~/bin/dumpSTR", line 33, in <module>
    sys.exit(load_entry_point('trtools==4.0.0', 'console_scripts', 'dumpSTR')())
  File "~/lib/python3.6/site-packages/trtools-4.0.0-py3.6.egg/trtools/dumpSTR/dumpSTR.py", line 1245, in run
    retcode = main(args)
  File "~/lib/python3.6/site-packages/trtools-4.0.0-py3.6.egg/trtools/dumpSTR/dumpSTR.py", line 1183, in main
    record = ApplyCallFilters(record, call_filters, sample_info, invcf.samples)
  File "~/lib/python3.6/site-packages/trtools-4.0.0-py3.6.egg/trtools/dumpSTR/dumpSTR.py", line 569, in ApplyCallFilters
    filt_output = filt(record)
  File "~/lib/python3.6/site-packages/trtools-4.0.0-py3.6.egg/trtools/dumpSTR/filters.py", line 706, in __call__
    ci = np.stack(ci)
  File "<__array_function__ internals>", line 6, in stack
  File "~/lib/python3.6/site-packages/numpy-1.18.1-py3.6-linux-x86_64.egg/numpy/core/shape_base.py", line 426, in stack
    raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape

If I delete the chrX variants from the GangSTR output, then DumpSTR runs fine.

Incidentally, Fragile X Syndrome and Kennedy disease are 2 common X-linked repeat expansion disorders, so it would be nice to be able to find these by gangSTR & dumpSTR.

I am getting a similar error with chrX. If i remove chrX from the vcf file, dumpSTR runs smoothly. if i include, i get this error.

Traceback (most recent call last): File "/project/jcreminslab/kenrc_projects/softwares.2/TRTools/venv.1/bin/dumpSTR", line 8, in <module> sys.exit(run()) File "/project/jcreminslab/kenrc_projects/softwares.2/TRTools/venv.1/lib/python3.6/site-packages/trtools/dumpSTR/dumpSTR.py", line 1245, in run retcode = main(args) File "/project/jcreminslab/kenrc_projects/softwares.2/TRTools/venv.1/lib/python3.6/site-packages/trtools/dumpSTR/dumpSTR.py", line 1204, in main record.vcfrecord.INFO['HWEP'] = utils.GetHardyWeinbergBinomialTest(allele_freqs, genotype_counts) File "/project/jcreminslab/kenrc_projects/softwares.2/TRTools/venv.1/lib/python3.6/site-packages/trtools/utils/utils.py", line 312, in GetHardyWeinbergBinomialTest if gt[1] not in allele_freqs.keys(): IndexError: tuple index out of range

i use these commands:

GangSTR --bam input.bam --ref hg38.fa --regions hg38_ver13.bed --out input --bam-samps input --samp-sex M

dumpSTR --vcf input.vcf --out out.5 --gangstr-min-call-DP 5 --gangstr-filter-spanbound-only --gangstr-filter-badCI --gangstr-max-call-DP 1000 --gangstr-min-call-Q 0.6

I'm having the same error as @rckeerthivasan when having variants on chrX on males
My settings:

 ganstr_command = "GangSTR"\
                    + " --bam " + bamfile + " " + "--ref " + ref \
                    + " --regions regions/my_panel.tsv"\
                    + " --samp-sex " + patient_sex\
                    + " --bam-samps " + patient\
                    + " --out " + output_dir + "/" + file_name \
                    + " --output-readinfo"\
                    + " --nonuniform"\
                    + " --include-ggl"

  dumpstr_command = "dumpSTR" \
                    + " --vcf " + gzvcf_name\
                    + " --out " + filtered_dir + "/" + file_name\
                    + " --gangstr-min-call-DP  20"\
                    + " --gangstr-max-call-DP  1000"\
                    + " --gangstr-filter-spanbound-only"\
                    + " --gangstr-filter-badCI"\
                    + " --zip"\
                    + " --drop-filtered"

The error:

Traceback (most recent call last):
  File "/home/oxana/.local/bin/dumpSTR", line 8, in <module>
    sys.exit(run())
  File "/home/oxana/.local/lib/python3.8/site-packages/trtools/dumpSTR/dumpSTR.py", line 1245, in run
    retcode = main(args)
  File "/home/oxana/.local/lib/python3.8/site-packages/trtools/dumpSTR/dumpSTR.py", line 1204, in main
    record.vcfrecord.INFO['HWEP'] = utils.GetHardyWeinbergBinomialTest(allele_freqs, genotype_counts)
  File "/home/oxana/.local/lib/python3.8/site-packages/trtools/utils/utils.py", line 312, in GetHardyWeinbergBinomialTest
    if gt[1] not in allele_freqs.keys():
IndexError: tuple index out of range

Do you have any ideas for a workaround before it's fixed?

Running it on a tumor sample, so high variation is probable, including copy numbers and ploidy

Only workaround I have is to delete chrX variants from the gangSTR output before passing to dumpSTR. It's not ideal considering there are several X-linked repeat expansion disorders, but at least it allows dumpSTR to run on the rest of the genome, and also verifies that this is the correct problem. Interesting that we're all getting different errors:

TypeError: 'int' object is not iterable
ValueError: all input arrays must have the same shape
IndexError: tuple index out of range