GermanDemidov/segmentation_before_CNV_calling

IndexError: list index out of range

Opened this issue · 13 comments

I was trying to run the merge_segmented_coverage.py script on my data. I have generated coverage files using samtools and a segmented bed file was generated using probes_from_bed.py.

I am getting the below error while running.

Traceback (most recent call last): File "merge_segmented_coverage.py", line 136, in <module> main() File "merge_segmented_coverage.py", line 133, in main merge_coverage_file(bed_file, coverage_file, output_file) File "merge_segmented_coverage.py", line 99, in merge_coverage_file divided_coverage = divide_c [0200730251_S38_L006_n20000.cov.txt](https://github.com/GermanDemidov/segmentation_before_CNV_calling/files/9799475/0200730251_S38_L006_n20000.cov.txt) overage(cluster, coveragesFromCluster) File "merge_segmented_coverage.py", line 31, in divide_coverage second_cov = coverages_from_segmented[2 * i + 2][2] * (coverages_from_segmented[2 * i + 2][1] - coverages_from_segmented[2 * i + 2][0]) IndexError: list index out of range
0200730251_S38_L006_n20000.cov.txt

supersegmented_20000.bed.txt

I am attaching input files.
Could you please check into this ?

What's the command you use to run it? There should be several other input files so I'd be able to debug it. 2 bed files and one COV file, I think

python3 merge_segmented_coverage.py --bed supersegmented_20000.bed --output final_assembled_coverage.cov --coverage 0200730251_S38_L006_n20000.cov

am I right that you use the first 20000 lines for some reason?

Try to put your kit BED file after --bed and also I'd recommend to use full files

I have put the 20000 lines for GitHub only

Exome-Agilent_V6_hg38_20000.bed.txt
I have used the above bed file, and its still has the same error.

20000 lines in one file do not correspond to 20000 lines in another file. One is segmented, another is not.

Supersegmented.bed:
chr1 12080 12199
chr1 12131 12250

cov.txt:
chr1 12080 12131 0
chr1 12131 12199 0

Try to use your kit bed file
python3 merge_segmented_coverage.py --bed kit.bed --output final_assembled_coverage.cov --coverage 200730251_S38_L006.cov.txt

I made bed file and cov file matching (1342 lines from bed, 6135 from coverage), here is the merge result.

Command:

python3 merge_segmented_coverage.py --bed ../Exome-Agilent_V6_hg38_20000.bed.txt --output fin.cov --coverage ../0200730251_S38_L006_n20000.cov.txt

fin.txt

I have followed the below steps so far.

  1. python3 probes_from_bed.py --bed Exome-Agilent_V6_hg38.bed --output supersegmented.bed --probLen 120
    This command has generate two files supersegmented.bed and supersegmented.for_coverage.bed. Then i used this file supersegmented.for_coverage.bed
  2. samtools bedcov supersegmented.for_coverage.bed 0200730251_S38_L006.bwa.cram -Q 3 > 0200730251_S38_L006.cov
  3. python3 merge_segmented_coverage.py --bed supersegmented.bed --output final_assembled_coverage.cov --coverage 0200730251_S38_L006.cov

I am wondering if this correct processing order for that data ?

Regards,

Try to use the original bed file for the last step

I have used below command.

python3 merge_segmented_coverage.py --bed Exome-Agilent_V6_hg38.bed --output final_assembled_coverage.cov --coverage 0200730251_S38_L006.cov
but still getting errors.

Traceback (most recent call last): File "merge_segmented_coverage.py", line 136, in <module> main() File "merge_segmented_coverage.py", line 133, in main merge_coverage_file(bed_file, coverage_file, output_file) File "merge_segmented_coverage.py", line 99, in merge_coverage_file divided_coverage = divide_coverage(cluster, coveragesFromCluster) File "merge_segmented_coverage.py", line 31, in divide_coverage second_cov = coverages_from_segmented[2 * i + 2][2] * (coverages_from_segmented[2 * i + 2][1] - coverages_from_segmented[2 * i + 2][0]) IndexError: list index out of range

Hmmm I don't really understand what's wrong. Could you send me these files in full? Maybe some border case with unknown contigs (I think you've used lifted over file). My email is german dot m dot demidov at gmail dot com. I think I need only initial BED file, the COV file I can fill in with 0s, it is not the problem of some wrong values, but it seems that the COV and supersegmented BED don't match each other in terms of regions.

I found the bug! In your input BED file there are overlapping regions. E.g.

chr1	143882283	143882366
chr1	143882283	143882408

Merge your input file before starting the script!