OverflowError reads_to_bins

Question

OverflowError reads_to_bins

friederhadlich opened this issue 4 years ago · 24 comments

Hi Endre,

I sequentially used this call
epic2 -bin 200 -g 3 -fdr 0.05 -t $TREATMENT -c $CONTROL -cs ../4_epic2/Bos_taurus.ARS-UCD1.2.dna.toplevel.chromsizesand epic2 stops with an error message for some of my treatments.

Do you have an idea of how to continue?

This is the typical error (console output):

Found a median readlength of 50.0

Using chromosome sizes found in
../4_epic2/Bos_taurus.ARS-UCD1.2.dna.toplevel.chromsizes.

Using an effective genome length of ~2172 * 1e6

Parsing ChIP file(s):
H3K4me1_MACT_P60_E.sorted.bam
Traceback (most recent call last):
File "/home/hadlich/miniconda/bin/epic2", line 248, in
_main(args)
File
"/home/hadlich/miniconda/lib/python3.7/site-packages/epic2/main.py",
line 40, in _main
args["treatment"], args, "ChIP")
File "epic2/src/reads_to_bins.pyx", line 183, in
epic2.src.reads_to_bins.files_to_bin_counts
File "epic2/src/reads_to_bins.pyx", line 309, in
epic2.src.reads_to_bins.files_to_bin_counts
File "epic2/src/reads_to_bins.pyx", line 61, in
epic2.src.reads_to_bins.remove_out_of_bounds_bins
OverflowError: Python int too large to convert to C long

I checked the chromosome sizes and they are all fine! Maybe the BWA mappings ahead cause the error ... but why this error occurs only for a few treatments?

Hope you can help ... or give an idea to continue.

Greetings from Germany,

Frieder

Answer 1 · 2020-07-13T12:42:31.000Z

Thanks for creating an issue. Do you have the possibility to share the files with me?

Answer 2 · 2020-07-13T12:47:34.000Z

The error is here:

https://github.com/biocore-ntnu/epic2/blob/master/epic2/src/reads_to_bins.pyx#L61

The error message likely means that either one of your chromsizes or one of the positions in the alignment files are above the uint32 max... Or that they were a negative number and got misinterpreted as a large int.

If you cannot share the files I suggest you go through the starts and ends in your alignment files and see what the max and min value of all the read positions are. Sometimes nonaligned reads are given a start and end of -1 for example.

Answer 3 · 2020-07-13T12:51:41.000Z

I used Bos_taurus.ARS-UCD1.2.dna.toplevel.fa as genomic reference for the BWA mapping. Real data cannot be uploaded ... I will check the mapping results for erroneous positions and let you know. Thanks a lot!

Answer 4 · 2020-07-13T12:53:51.000Z

Try converting to bed files with bedtools and see if it works then? Good luck and feel free to ping me for anything. Without the files it is impossible for me to debug as I am sure you understand :)

…

On Mon, Jul 13, 2020 at 2:51 PM friederhadlich ***@***.***> wrote: I used Bos_taurus.ARS-UCD1.2.dna.toplevel.fa as genomic reference for the BWA mapping. Real data cannot be uploaded ... I will check the mapping results for erroneous positions and let you know. Thanks a lot! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#36 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEHURUXFBZS5Z56PIDPDQXLR3L7OZANCNFSM4OYOFEEQ> .

Answer 5 · 2020-07-15T10:08:45.000Z

Unfortunately, bed files aren't the solution for me! I tried control and treatment inputs with bam and bed format without any effect.

The problem must be in parameter "fs" ... please see these calls:
epic2 -t ../3_bwa/H3K27Ac_MACT_p59_E.sorted.bam -bin 200 -g 3 -fdr 0.05 -fs 300 -egf 0.8 -c ../3_bwa/INPUT_pool_MACT_E.sorted.bam.bed -cs Bos_taurus.ARS-UCD1.2.dna.toplevel.chromsizes > H3K27Ac_MACT_p59_E.txt

Found a median readlength of 50.0

Using chromosome sizes found in Bos_taurus.ARS-UCD1.2.dna.toplevel.chromsizes.

Using an effective genome length of ~2172 * 1e6

Parsing ChIP file(s):
../3_bwa/H3K27Ac_MACT_p59_E.sorted.bam
Traceback (most recent call last):
File "/home/hadlich/miniconda/bin/epic2", line 248, in
_main(args)
File "/home/hadlich/miniconda/lib/python3.7/site-packages/epic2/main.py", line 40, in _main
args["treatment"], args, "ChIP")
File "epic2/src/reads_to_bins.pyx", line 183, in epic2.src.reads_to_bins.files_to_bin_counts
File "epic2/src/reads_to_bins.pyx", line 309, in epic2.src.reads_to_bins.files_to_bin_counts
File "epic2/src/reads_to_bins.pyx", line 61, in epic2.src.reads_to_bins.remove_out_of_bounds_bins
OverflowError: Python int too large to convert to C long

epic2 -t ../3_bwa/H3K27Ac_MACT_p59_E.sorted.bam -bin 200 -g 3 -fdr 0.05 -egf 0.8 -c ../3_bwa/INPUT_pool_MACT_E.sorted.bam.bed -cs Bos_taurus.ARS-UCD1.2.dna.toplevel.chromsizes > H3K27Ac_MACT_p59_E.txt

Found a median readlength of 50.0

Using chromosome sizes found in Bos_taurus.ARS-UCD1.2.dna.toplevel.chromsizes.

Using an effective genome length of ~2172 * 1e6

Parsing ChIP file(s):
../3_bwa/H3K27Ac_MACT_p59_E.sorted.bam

Valid ChIP reads: 11245276 (11245276 before out of bounds removal)

Score threshold: 19.652

Number of tags in a window: 3

Number of islands found: 39544

Parsing Input file(s):
../3_bwa/INPUT_pool_MACT_E.sorted.bam.bed

Valid Background reads: 23502103 (23502128 before out of bounds removal)

Answer 6 · 2020-07-15T10:14:11.000Z

Chromosome positions of the mapping file are fine for treatment and control, as you can see here for control:

cat ../3_bwa/INPUT_pool_MACT_E.sorted.bam.bed | cut -f 3 | awk 'BEGIN{a=10000;b=0};{if ($0<a) a=$0; if ($0>b) b=$0}; END{print a,b;}'

37 158533984

cat ../3_bwa/INPUT_pool_MACT_E.sorted.bam.bed | cut -f 2 | awk 'BEGIN{a=10000;b=0};{if ($0<a) a=$0; if ($0>b) b=$0}; END{print a,b;}'

0 158533934

Answer 7 · 2020-07-15T11:41:45.000Z

One more running example:
epic2 -t ../3_bwa/H3K27Ac_MACT_p59_E.sorted.bam -bin 500 -g 3 -fdr 0.05 -fs 300 -egf 0.8 -c ../3_bwa/INPUT_pool_MACT_E.sorted.bam.bed -cs Bos_taurus.ARS-UCD1.2.dna.toplevel.chromsizes > H3K27Ac_MACT_p59_E.txt

Answer 8 · 2020-07-15T15:51:27.000Z

I will look at it in a few weeks. Taking a vacation now :) Thanks for providing more details :)

…

On Wednesday, July 15, 2020, friederhadlich ***@***.***> wrote: One more running example: epic2 -t ../3_bwa/H3K27Ac_MACT_p59_E.sorted.bam -bin 500 -g 3 -fdr 0.05 -fs 300 -egf 0.8 -c ../3_bwa/INPUT_pool_MACT_E.sorted.bam.bed -cs Bos_taurus.ARS-UCD1.2.dna.toplevel.chromsizes > H3K27Ac_MACT_p59_E.txt — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#36 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEHURURJZXZ7WV7FYTLFOQTR3WIYPANCNFSM4OYOFEEQ> .

Answer 9 · 2020-07-28T21:16:04.000Z

I am also seeing this error -- but I figured out a temporary workaround. I am using hg38 chrom.sizes from UCSC link and a paired-end fastq file which I aligned with bwa and converted to a bed file.

Here is the epic2 command:

epic2 -t RSeq_out/tmp/pe_beds/SRX2481503_S96_DRIP_Seq_1.hg38.experiment.bed -e 100 -fdr .01 --effective-genome-fraction 0.6753009149366569 --mapq 30 --chromsizes RSeq_out/tmp/hg38.chrom.sizes --output RSeq_out/peaks_epic_unstranded/SRX2481503_S96_DRIP_Seq_1_hg38.bed

And the output:

Found a median fragment size of 183.5

Using chromosome sizes found in RSeq_out/tmp/hg38.chrom.sizes.

Using an effective genome length of ~2167 * 1e6

Parsing ChIP file(s):
  RSeq_out/tmp/pe_beds/SRX2481503_S96_DRIP_Seq_1.hg38.experiment.bed
Traceback (most recent call last):
  File "/home/UTHSCSA/millerh1/miniconda3/envs/RSeqEnv/bin/epic2", line 248, in <module>
    _main(args)
  File "/home/UTHSCSA/millerh1/miniconda3/envs/RSeqEnv/lib/python3.7/site-packages/epic2/main.py", line 40, in _main
    args["treatment"], args, "ChIP")
  File "epic2/src/reads_to_bins.pyx", line 183, in epic2.src.reads_to_bins.files_to_bin_counts
  File "epic2/src/reads_to_bins.pyx", line 309, in epic2.src.reads_to_bins.files_to_bin_counts
  File "epic2/src/reads_to_bins.pyx", line 61, in epic2.src.reads_to_bins.remove_out_of_bounds_bins
OverflowError: Python int too large to convert to C long

The error goes away when I remove all the non-standard chromosomes from the chrom.sizes file, though I don't understand why. Regardless most people will have these non-standard chromosomes so it would be great if using them was supported.

Here's chrom.sizes modified by me:

chr1	248956422
chr2	242193529
chr3	198295559
chr4	190214555
chr5	181538259
chr6	170805979
chr7	159345973
chrX	156040895
chr8	145138636
chr9	138394717
chr11	135086622
chr10	133797422
chr12	133275309
chr13	114364328
chr14	107043718
chr15	101991189
chr16	90338345
chr17	83257441
chr18	80373285
chr20	64444167
chr19	58617616
chrY	57227415
chr22	50818468
chr21	46709983

The output of the same epic2 command as before, but with the updated chrom.sizes:

effective-genome-fraction 0.6753009149366569 --mapq 30 --chromsizes RSeq_out/tmp/hg38.chrom.sizes --output RSeq_out/peaks_epic_unstranded/SRX2481503_S96_DRIP_Seq_1_hg38.bed
Found a median fragment size of 183.5

Using chromosome sizes found in RSeq_out/tmp/hg38.chrom.sizes.

Using an effective genome length of ~2085 * 1e6

Parsing ChIP file(s):
  RSeq_out/tmp/pe_beds/SRX2481503_S96_DRIP_Seq_1.hg38.experiment.bed
Chromosome chr10_GL383545v1_alt not in the chromosome sizes: chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY
Chromosome chr10_GL383545v1_alt not in the chromosome sizes: chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY
...
Chromosome chrUn_KI270757v1 not in the chromosome sizes: chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY
Chromosome chrX_KI270880v1_alt not in the chromosome sizes: chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY
Chromosome chrX_KI270913v1_alt not in the chromosome sizes: chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY

Valid ChIP reads: 24012551 (24017420 before out of bounds removal)

Score threshold: 27.632

Number of tags in a window: 5

Number of islands found: 6736

Empirical estimate of FDR is: 0.014845605700712588 (100/6736)

Hope this is helpful!

Best,
Henry

Answer 10 · 2020-07-29T09:18:37.000Z

I will definitely try to fix this soon. Will test with non-standard chromosomes and see what happens.

…

On Tue, Jul 28, 2020 at 11:16 PM Henry Miller ***@***.***> wrote: I am also seeing this error -- but I figured out a temporary workaround. I am using hg38 chrom.sizes from UCSC link <https://hgdownload-test.gi.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes> and a paired-end fastq file which I aligned with bwa and converted to a bed file. Here is the epic2 command: epic2 -t RSeq_out/tmp/pe_beds/SRX2481503_S96_DRIP_Seq_1.hg38.experiment.bed -e 100 -fdr .01 --effective-genome-fraction 0.6753009149366569 --mapq 30 --chromsizes RSeq_out/tmp/hg38.chrom.sizes --output RSeq_out/peaks_epic_unstranded/SRX2481503_S96_DRIP_Seq_1_hg38.bed And the output: Found a median fragment size of 183.5 Using chromosome sizes found in RSeq_out/tmp/hg38.chrom.sizes. Using an effective genome length of ~2167 * 1e6 Parsing ChIP file(s): RSeq_out/tmp/pe_beds/SRX2481503_S96_DRIP_Seq_1.hg38.experiment.bed Traceback (most recent call last): File "/home/UTHSCSA/millerh1/miniconda3/envs/RSeqEnv/bin/epic2", line 248, in <module> _main(args) File "/home/UTHSCSA/millerh1/miniconda3/envs/RSeqEnv/lib/python3.7/site-packages/epic2/main.py", line 40, in _main args["treatment"], args, "ChIP") File "epic2/src/reads_to_bins.pyx", line 183, in epic2.src.reads_to_bins.files_to_bin_counts File "epic2/src/reads_to_bins.pyx", line 309, in epic2.src.reads_to_bins.files_to_bin_counts File "epic2/src/reads_to_bins.pyx", line 61, in epic2.src.reads_to_bins.remove_out_of_bounds_bins OverflowError: Python int too large to convert to C long The error goes away when I remove all the non-standard chromosomes from the chrom.sizes file, though I don't understand why. Regardless most people will have these non-standard chromosomes so it would be great if using them was supported. Here's chrom.sizes modified by me: chr1 248956422 chr2 242193529 chr3 198295559 chr4 190214555 chr5 181538259 chr6 170805979 chr7 159345973 chrX 156040895 chr8 145138636 chr9 138394717 chr11 135086622 chr10 133797422 chr12 133275309 chr13 114364328 chr14 107043718 chr15 101991189 chr16 90338345 chr17 83257441 chr18 80373285 chr20 64444167 chr19 58617616 chrY 57227415 chr22 50818468 chr21 46709983 The output of the same epic2 command as before, but with the updated chrom.sizes: effective-genome-fraction 0.6753009149366569 --mapq 30 --chromsizes RSeq_out/tmp/hg38.chrom.sizes --output RSeq_out/peaks_epic_unstranded/SRX2481503_S96_DRIP_Seq_1_hg38.bed Found a median fragment size of 183.5 Using chromosome sizes found in RSeq_out/tmp/hg38.chrom.sizes. Using an effective genome length of ~2085 * 1e6 Parsing ChIP file(s): RSeq_out/tmp/pe_beds/SRX2481503_S96_DRIP_Seq_1.hg38.experiment.bed Chromosome chr10_GL383545v1_alt not in the chromosome sizes: chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY Chromosome chr10_GL383545v1_alt not in the chromosome sizes: chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY ... Chromosome chrUn_KI270757v1 not in the chromosome sizes: chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY Chromosome chrX_KI270880v1_alt not in the chromosome sizes: chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY Chromosome chrX_KI270913v1_alt not in the chromosome sizes: chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY Valid ChIP reads: 24012551 (24017420 before out of bounds removal) Score threshold: 27.632 Number of tags in a window: 5 Number of islands found: 6736 Empirical estimate of FDR is: 0.014845605700712588 (100/6736) Hope this is helpful! Best, Henry — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#36 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEHURUULS44QJPLXKM365ZLR5452HANCNFSM4OYOFEEQ> .

Answer 11 · 2020-08-10T06:28:02.000Z

I have now started my sleuthing to fix this problem.

There is one thing which I did not think of communicating: if you include chromosomes in the chromsizes file that are not in the reads, the algorithm becomes more conservative. It computes a Poisson parameter based on the number of reads in the genome. With more chromosomes, the genome becomes longer, and hence less sparsely populated.

I have tried:

adding nonexisting chromosomes to the chromsizes file
added nonexisting chromosomes to the treatment file

but have been unable to break epic2. If anyone can upload a breaking example I will do my best to fix this problem. But I have not been able to reproduce. Version: 0.0.41. If the data is semi-sensitive you can 7zip with a password and send the file by e-mail.

Answer 12 · 2020-08-13T07:19:49.000Z

epic2 -bin 200 -g 3 -fdr 0.05 -t /mnt/cargo/endrebak/control.sorted.filtered.bam -cs Bos_taurus.ARS-UCD1.2.dna.toplevel.chromsizes > /dev/null
Found a median readlength of 50.0

Using chromosome sizes found in Bos_taurus.ARS-UCD1.2.dna.toplevel.chromsizes.

Using an effective genome length of ~2344 * 1e6

Parsing ChIP file(s):
  /mnt/cargo/endrebak/control.sorted.filtered.bam

Valid ChIP reads: 1949548 (1949548 before out of bounds removal)

Score threshold: 22.615000000000002

Number of tags in a window: 1

Number of islands found: 16028

Empirical estimate of FDR is: 0.062390816071874224 (1000/16028)

I.e. was not able to reproduce. Tomorrow I will wrap the affected code in a try/catch and print the error with detailed information. Then you can report that info back to me and I will try my best to understand and fix it :)

Answer 13 · 2020-08-13T09:01:09.000Z

Hi Endre,

please use treatment and control argument to reproduce the error ... i.e.

epic2 -t treatment.sorted.bam -bin 200 -g 3 -fdr 0.05 -fs 300 -egf 0.8 -c control.sorted.filtered.bam -cs Bos_taurus.ARS-UCD1.2.dna.toplevel.chromsizes

Good luck, Frieder

Answer 14 · 2020-08-13T14:28:20.000Z

epic2 -fs 300 -bin 200 -g 3 -fdr 0.05 -t /mnt/cargo/endrebak/treatment.sorted.filtered.bam -c /mnt/cargo/endrebak/control.sorted.filtered.bam -cs Bos_taurus.ARS-UCD1.2.dna.toplevel.chromsizes > /dev/null
Found a median readlength of 50.0

Using chromosome sizes found in Bos_taurus.ARS-UCD1.2.dna.toplevel.chromsizes.

Using an effective genome length of ~2344 * 1e6

Parsing ChIP file(s):
  /mnt/cargo/endrebak/treatment.sorted.filtered.bam

Valid ChIP reads: 1185813 (1185813 before out of bounds removal)

Score threshold: 17.337

Number of tags in a window: 1

Number of islands found: 9343

Parsing Input file(s):
  /mnt/cargo/endrebak/control.sorted.filtered.bam

Valid Background reads: 1949548 (1949548 before out of bounds removal)

I still am unable to reproduce :) So I will stick to my plan of implementing good reporting in the case of errors.

I am using version 0.0.41 and linux. What about you people?

Answer 15 · 2020-08-14T08:22:33.000Z

Now a 0.0.42 is out which should give a better error message when it happens :).

Answer 16 · 2020-08-14T12:08:34.000Z

Strange! All my samples pass epic2 with version 0.0.42! Thx a lot for your support!

Answer 17 · 2020-08-14T15:34:25.000Z

Indexplicable weirdness is common 🙈 Thanks for reporting.

It might have been that I forgot to raise an exception if I caught one. Were any error messages?

Answer 18 · 2020-08-14T15:37:25.000Z

Look for an error message like this:

    except OverflowError as e:
        print(e)
        print("\nAdditional info:\n")
        print("Chromosome:", chromosome)
        print("Chromsize:", chromsize)

Answer 19 · 2020-08-14T16:32:58.000Z

0.0.43 (hotfix) should raise an error if OverflowErrors happen again. These were noisily ignored in 0.0.42

Answer 20 · 2020-08-14T20:47:16.000Z

I am still getting this error with epic2 0.0.42... though it finishes and gives some additional info:

(base) millerh1@cbbi16:~/Bishop.lab/Projects/RSeq$ epic2 -t RSeq_out/tmp/pe_beds/SRX2481503_S96_DRIP_Seq_1.hg38.experiment.bed -e 100 -fdr .01 --effective-genome-fraction 0.6753009149366569 --mapq 30 --chromsizes RSeq_out/tmp/hg38.chrom.sizes --output RSeq_out/peaks_epic_unstranded/SRX2481503_S96_DRIP_Seq_1_hg38.bed
Found a median fragment size of 183.5
 
Using chromosome sizes found in RSeq_out/tmp/hg38.chrom.sizes.
 
Using an effective genome length of ~2167 * 1e6
 
Parsing ChIP file(s): 
  RSeq_out/tmp/pe_beds/SRX2481503_S96_DRIP_Seq_1.hg38.experiment.bed 
Python int too large to convert to C long

Additional info:

Chromosome: chr2_GL582966v2_alt
Chromsize: 96131

Valid ChIP reads: 24120376 (24125321 before out of bounds removal)
 
Score threshold: 45.687
 
Number of tags in a window: 4
 
Number of islands found: 5165
 
Empirical estimate of FDR is: 0.01936108422071636 (100/5165)

The chrom.sizes are here: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes

I'm hosting the bed file here: https://hem-misc.s3-us-west-2.amazonaws.com/SRX2481503_S96_DRIP_Seq_1.hg38.experiment.bed

Thank you!

Henry

Answer 21 · 2020-08-17T16:18:18.000Z

I will look at it. Thanks!

…

On Fri, Aug 14, 2020 at 10:47 PM Henry Miller ***@***.***> wrote: I am still getting this error with epic2 0.0.42... though it finishes and gives some additional info: (base) ***@***.***:~/Bishop.lab/Projects/RSeq$ epic2 -t RSeq_out/tmp/pe_beds/SRX2481503_S96_DRIP_Seq_1.hg38.experiment.bed -e 100 -fdr .01 --effective-genome-fraction 0.6753009149366569 --mapq 30 --chromsizes RSeq_out/tmp/hg38.chrom.sizes --output RSeq_out/peaks_epic_unstranded/SRX2481503_S96_DRIP_Seq_1_hg38.bed Found a median fragment size of 183.5 Using chromosome sizes found in RSeq_out/tmp/hg38.chrom.sizes. Using an effective genome length of ~2167 * 1e6 Parsing ChIP file(s): RSeq_out/tmp/pe_beds/SRX2481503_S96_DRIP_Seq_1.hg38.experiment.bed Python int too large to convert to C long Additional info: Chromosome: chr2_GL582966v2_alt Chromsize: 96131 Valid ChIP reads: 24120376 (24125321 before out of bounds removal) Score threshold: 45.687 Number of tags in a window: 4 Number of islands found: 5165 Empirical estimate of FDR is: 0.01936108422071636 (100/5165) The chrom.sizes are here: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes I'm hosting the bed file here: https://hem-misc.s3-us-west-2.amazonaws.com/SRX2481503_S96_DRIP_Seq_1.hg38.experiment.bed Thank you! Henry — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#36 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEHURUWDMIUJYM2YXMY5GBLSAWPGDANCNFSM4OYOFEEQ> .

Answer 22 · 2020-08-18T09:02:31.000Z

What about now? 0.0.44 is out :)

Answer 23 · 2020-08-18T14:01:10.000Z

Well the number of reads and peaks does not change, but the error disappeared! I'm assuming that this is alright now. Thanks!

Answer 24 · 2020-08-18T14:19:36.000Z

No problem. There must have been some weirdness with the Cython and C++ interop is my guess :)