It successfully ran the test data, but failed with real data.

Question

It successfully ran the test data, but failed with real data.

Closed this issue 6 months ago · 2 comments

Hi,Dr. Yu，

It successfully ran the test data, but failed with real data. I would like to ask that the program was killed while running the part "Calculate frequency of each transposon insertion". Is this due to a previous error, or does bedtools really require a lot of RAM? bedtools uses 40GB of RAM, if it is the latter, I will replace my device.

best wishes

dan

TEMP2 insertion -l /home/danding/practice/workspace/data/GALG1_1.fq.gz -r /home/danding/practice/workspace/data/GALG1_2.fq.gz -i /home/danding/practice/workspace/data/GALG1.sort.bam -g /home/danding/practice/workspace/newbegin/7b/7b_genome.fa -R /home/danding/practice/workspace/newbegin/cdhit/7b_final.fa -t ./7b.bed -o ./GALG1 -c 5
Testing required softwares:
bwa: /home/danding/miniconda3/bin/bwa
samtools: /home/danding/miniconda3/bin/samtools
bedtools: /home/danding/miniconda3/bin/bedtools
------ Start pipeline ------
get concordant-uniq-split reads Fri Mar 15 10:01:03 CST 2024
[bam_sort_core] merging from 0 files and 5 in-memory blocks...
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 161088 reads
check fragment length Fri Mar 15 10:15:41 CST 2024
insert size set to 95 quantile: 477
get mate seq of the uniq-unpaired Fri Mar 15 10:15:42 CST 2024
[bam_sort_core] merging from 0 files and 5 in-memory blocks...
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 629970 reads
map paired split uniqMappers and unpaired uniqMappers to transposons Fri Mar 15 10:16:35 CST 2024
merge fragments in genome and transposon Fri Mar 15 10:16:50 CST 2024
pass1 - making usageList (58 chroms): 4 millis
pass2 - checking and writing primary data (20159 records, 6 fields): 35 millis
merge support reads in the same direction within 477 - 150 Fri Mar 15 10:17:02 CST 2024
merge support reads in different direction within 2 X 477 - 150 Fri Mar 15 10:17:12 CST 2024

**filter candidate insertions which overlap with the same transposon insertion or in high depth region Fri Mar 15 10:17:14 CST 2024

Differing number of BED fields encountered at line: 3. Exiting...
Differing number of BED fields encountered at line: 3. Exiting...
Differing number of BED fields encountered at line: 3. Exiting...

filter candidate insertions in high depth region Fri Mar 15 10:17:14 CST 2024
average read number for 200bp bins is 81.287, set read number cutoff to 406.435
Filtered insertion number: 11191 - 11191 (overlap rmsk) 0 (short insertion) - 0 (high depth) = 0
generate the overall distribution of transposon mapping reads, first map all reads to transposon Fri Mar 15 10:27:57 CST 2024
sam to bed and bedGraph, multiple mappers are divided by their map times Fri Mar 15 11:16:57 CST 2024
[bam_sort_core] merging from 1 files and 5 in-memory blocks...
estimate de novo insertion number for each transposon using singleton reads Fri Mar 15 11:21:10 CST 2024

**generate distribution figures for singleton supporting reads Fri Mar 15 11:21:12 CST 2024

Error in read.table(Args[8], header = F, row.names = NULL) :
no lines available in input
Execution halted

filter unreliable singleton insertions, also filter 2p insertions overlapped with similar reference transposon copies Fri Mar 15 11:21:13 CST 2024
Calculate frequency of each transposon insertion Fri Mar 15 11:21:13 CST 2024

**[bam_sort_core] merging from 23 files and 5 in-memory blocks...

Killed**

get TSD, remove redundant insertions and recalculate de novo insertion rate Fri Mar 15 11:54:33 CST 2024

***** ERROR: Requested column 2, but database file - only has fields 1 - 0.
GALG1.t is empty
calculate de novo insertion rate per genome Fri Mar 15 11:54:33 CST 2024
clean tmp files Fri Mar 15 11:54:33 CST 2024
Done, Congras!!!🍺🍺🍺

Answer 1 · 2024-03-16T13:18:53.000Z

Hi Dan, It looks like to me your repeat masker file is not a standard bed file, which the 3th row has different number of column than the others. Best, Tianxiong Yu Postdoctoral Research Assistant Weng Lab, Albert Sherman Center AS5-1079 Program in Bioinformatics and Integrative Biology UMass Chan Medical School

…

On Mar 16, 2024, at 1:29 AM, dan chen ***@***.***> wrote: Hi,Dr. Yu， It successfully ran the test data, but failed with real data. I would like to ask that the program was killed while running the part "Calculate frequency of each transposon insertion". Is this due to a previous error, or does bedtools really require a lot of RAM? bedtools uses 40GB of RAM, if it is the latter, I will replace my device. best wishes dan TEMP2 insertion -l /home/danding/practice/workspace/data/GALG1_1.fq.gz -r /home/danding/practice/workspace/data/GALG1_2.fq.gz -i /home/danding/practice/workspace/data/GALG1.sort.bam -g /home/danding/practice/workspace/newbegin/7b/7b_genome.fa -R /home/danding/practice/workspace/newbegin/cdhit/7b_final.fa -t ./7b.bed -o ./GALG1 -c 5 Testing required softwares: bwa: /home/danding/miniconda3/bin/bwa samtools: /home/danding/miniconda3/bin/samtools bedtools: /home/danding/miniconda3/bin/bedtools ------ Start pipeline ------ get concordant-uniq-split reads Fri Mar 15 10:01:03 CST 2024 [bam_sort_core] merging from 0 files and 5 in-memory blocks... [M::bam2fq_mainloop] discarded 0 singletons [M::bam2fq_mainloop] processed 161088 reads check fragment length Fri Mar 15 10:15:41 CST 2024 insert size set to 95 quantile: 477 get mate seq of the uniq-unpaired Fri Mar 15 10:15:42 CST 2024 [bam_sort_core] merging from 0 files and 5 in-memory blocks... [M::bam2fq_mainloop] discarded 0 singletons [M::bam2fq_mainloop] processed 629970 reads map paired split uniqMappers and unpaired uniqMappers to transposons Fri Mar 15 10:16:35 CST 2024 merge fragments in genome and transposon Fri Mar 15 10:16:50 CST 2024 pass1 - making usageList (58 chroms): 4 millis pass2 - checking and writing primary data (20159 records, 6 fields): 35 millis merge support reads in the same direction within 477 - 150 Fri Mar 15 10:17:02 CST 2024 merge support reads in different direction within 2 X 477 - 150 Fri Mar 15 10:17:12 CST 2024 **filter candidate insertions which overlap with the same transposon insertion or in high depth region Fri Mar 15 10:17:14 CST 2024 Differing number of BED fields encountered at line: 3. Exiting... Differing number of BED fields encountered at line: 3. Exiting... Differing number of BED fields encountered at line: 3. Exiting... filter candidate insertions in high depth region Fri Mar 15 10:17:14 CST 2024 average read number for 200bp bins is 81.287, set read number cutoff to 406.435 Filtered insertion number: 11191 - 11191 (overlap rmsk) 0 (short insertion) - 0 (high depth) = 0 generate the overall distribution of transposon mapping reads, first map all reads to transposon Fri Mar 15 10:27:57 CST 2024 sam to bed and bedGraph, multiple mappers are divided by their map times Fri Mar 15 11:16:57 CST 2024 [bam_sort_core] merging from 1 files and 5 in-memory blocks... estimate de novo insertion number for each transposon using singleton reads Fri Mar 15 11:21:10 CST 2024 **generate distribution figures for singleton supporting reads Fri Mar 15 11:21:12 CST 2024 Error in read.table(Args[8], header = F, row.names = NULL) : no lines available in input Execution halted filter unreliable singleton insertions, also filter 2p insertions overlapped with similar reference transposon copies Fri Mar 15 11:21:13 CST 2024 Calculate frequency of each transposon insertion Fri Mar 15 11:21:13 CST 2024 **[bam_sort_core] merging from 23 files and 5 in-memory blocks... Killed** get TSD, remove redundant insertions and recalculate de novo insertion rate Fri Mar 15 11:54:33 CST 2024 ***** ERROR: Requested column 2, but database file - only has fields 1 - 0. GALG1.t is empty calculate de novo insertion rate per genome Fri Mar 15 11:54:33 CST 2024 clean tmp files Fri Mar 15 11:54:33 CST 2024 Done, Congras!!!🍺🍺🍺 — Reply to this email directly, view it on GitHub <#23>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFPDEDAL3QHKKMYAYT65JDTYYPKELAVCNFSM6AAAAABEZBHHNSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4DSNZYHEZDGMI>. You are receiving this because you are subscribed to this thread.

Answer 2 · 2024-03-17T07:03:00.000Z

Hi Yu，

Your intuition was correct. I modified the bed file and it worked.
Thank you for your patience！Wish you a happy life！🍺🍺🍺

Dan