WGLab/DeepRepeat

Error on running on example data and data available to us

Closed this issue · 12 comments

Hi,

I was trying to install and run Deep Repeat on our data and it errored out. I thought I might have been doing something wrong, so I tried it on the example data as outlined in this README. But that errors out with the same statement.

I have attached the output of running it on the example data. It seems to be erroring out with a Segmentation Fault for one of the dependent libraries which was built as part of the installation steps.

Would you be able to help us out?

``The following options are used (included default):
UniqueID (TGC_chr16_73546662-73546736);
bam (na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam);
basecalled_path (workspace/pass/);
f5config (DeepRepeat/bin/data/config/fast5_path.config);
f5folder (na12878_loci/TGC_chr16_73546662-73546736);
f5i (na12878_loci/TGC_chr16_73546662-73546736/na.f5index);
f5i_basefile (na);
feature_num (50);
label_size (4);
merge_gap (4.5);
mod_path (None);
mod_version (2);
multif5 (0);
nb_size (3);
nbsize (-1.5);
outlog (0);
outputfolder (test_op/TGC_chr16_73546662-73546736);
pcr (True);
repeat (chr16:73546662-73546736:TGC:3);
repeat_name (TGC_chr16_73546662-73546736);
repeat_pat (TGC);
rpg (DeepRepeat/bin/data/trf.v0.bed);
summary_file (na12878_loci/TGC_chr16_73546662-73546736/sequencing_summary.txt);

Generating features: DeepRepeat/bin/scripts/genomic1FE na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam test_op/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs na12878_loci/TGC_chr16_73546662-73546736/ na12878_loci/TGC_chr16_73546662-73546736/na.f5index chr16:73546662-73546736:TGC:3 DeepRepeat/bin/data/trf.v0.bed DeepRepeat/bin/data/config/fast5_path.config -1.5 500
p_f5_conf_file DeepRepeat/bin/data/config/fast5_path.config
Input = na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam test_op/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs
free(): invalid size
Segmentation fault
Error! Cannot generate fs file: test_op/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs``

@sabiqali Thanks for being interested in DeepRepeat.
Could you please share the relative path to fast5 files, and show what is output for h5ls -r YOUR-FAST5-FILE | head -n 50

it shows this:

/ Group /Analyses Group /Analyses/Basecall_1D_000 Group /Analyses/Basecall_1D_000/BaseCalled_template Group /Analyses/Basecall_1D_000/BaseCalled_template/Events Dataset {507468} /Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR} /Analyses/Basecall_1D_000/Configuration Group /Analyses/Basecall_1D_000/Configuration/basecall_1d Group /Analyses/Basecall_1D_000/Summary Group /Analyses/Basecall_1D_000/Summary/basecall_1d_template Group /Analyses/Calibration_Strand_Detection_000 Group /Analyses/Calibration_Strand_Detection_000/Configuration Group /Analyses/Calibration_Strand_Detection_000/Configuration/calib_detector Group /Analyses/Calibration_Strand_Detection_000/Summary Group /Analyses/Calibration_Strand_Detection_000/Summary/calibration_strand_template Group /Analyses/Segmentation_000 Group /Analyses/Segmentation_000/Configuration Group /Analyses/Segmentation_000/Configuration/stall_removal Group /Analyses/Segmentation_000/Summary Group /Analyses/Segmentation_000/Summary/segmentation Group /Raw Group /Raw/Reads Group /Raw/Reads/Read_14276 Group /Raw/Reads/Read_14276/Signal Dataset {2537490/Inf} /UniqueGlobalKey Group /UniqueGlobalKey/channel_id Group /UniqueGlobalKey/context_tags Group /UniqueGlobalKey/tracking_id Group

jts commented

Just to chime in here, we're having this problem with the example data that we downloaded from the tutorial link in this repo.

@jts @sabiqali : it is a C++ issue. Before the issue is fixed, could you please try to use docker. The command is docker run -v /data/data1/test/deeprepeat:/tmp --rm genomicslab/deeprepeat:0.1.4 python DeepRepeat.py Detect --gn hx1 --TempRem 0 --epchon 200 --repeat_relax_bp 20 --UniqueID TGC_chr16_73546662-73546736 --is_pcr 0 --repeatName TGC_chr16_73546662-73546736 --repeat chr16:73546662-73546736:TGC:3 --f5i /tmp/na12878_loci/TGC_chr16_73546662-73546736/na.f5index --o /tmp/test_op/TGC_chr16_73546662-73546736 --bam /tmp/na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam --f5folder /tmp/na12878_loci/TGC_chr16_73546662-73546736 --algLen 500 for example

did anybody run docker and do not encounter this problem? It may be a lack of memory issue as there is a statement "free(): invalid size" in the log message. Or, as Chris mentioned, maybe a GCC version issue where the code is not compiled correctly with a different version of GCC.

@kaichop, we are in the process of running the docker on our compute cluster. docker takes a bit more work to use on our cluster. I will revert when I have run it according to the suggestion by @liuqianhn.

May I ask which version of GCC you are expecting while compiling the dependencies?

This is an issue of GCC versions which I did not realize before. It is not memory issue because I used a node with very large free memory. In my zoom-in testing, the error shows that one system c++ file does not existing, which causes "free ()..." error.

@sabiqali I previously tested on an older gcc (v4.* or v5.*: I do not remember the detail).

@sabiqali @jts You can also try singularity which might be easy. The help document can be found here: https://carpentries-incubator.github.io/singularity-introduction/05-singularity-docker/index.html

@liuqianhn @kaichop we tried implementing the docker image, it took a bit of time and some tries to get it working but it seems to error out as well. We tried the command suggested above and it gives pretty much the same error:

free(): invalid size Aborted (core dumped)

This was the other error log:
The following options are used (included default):
UniqueID (TGC_chr16_73546662-73546736);
bam (/tmp/na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam);
basecalled_path (workspace/pass/);
f5config (/app/data/config/fast5_path.config);
f5folder (/tmp/na12878_loci/TGC_chr16_73546662-73546736);
f5i (/tmp/na12878_loci/TGC_chr16_73546662-73546736/na.f5index);
f5i_basefile (na);
feature_num (50);
label_size (4);
merge_gap (4.5);
mod_path (None);
mod_version (2);
multif5 (0);
nb_size (3);
nbsize (-1.5);
outlog (0);
outputfolder (/tmp/TGC_chr16_73546662-73546736);
pcr (True);
repeat (chr16:73546662-73546736:TGC:3);
repeat_name (TGC_chr16_73546662-73546736);
repeat_pat (TGC);
rpg (/app/data/trf.v0.bed);
summary_file (/tmp/na12878_loci/TGC_chr16_73546662-73546736/sequencing_summary.txt);
Generating features: /app/scripts/genomic1FE /tmp/na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam /tmp/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs /tmp/na12878_loci/TGC_chr16_73546662-73546736/ /tmp/na12878_loci/TGC_chr16_73546662-73546736/na.f5index chr16:73546662-73546736:TGC:3 /app/data/trf.v0.bed /app/data/config/fast5_path.config -1.5 500
p_f5_conf_file /app/data/config/fast5_path.config
Input = /tmp/na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam /tmp/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs
Error! Cannot generate fs file: /tmp/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs

As you can see, it is pretty much the same as the previous one.

@liuqianhn @kaichop we tried implementing the docker image, it took a bit of time and some tries to get it working but it seems to error out as well. We tried the command suggested above and it gives pretty much the same error:

free(): invalid size Aborted (core dumped)

This was the other error log: The following options are used (included default): UniqueID (TGC_chr16_73546662-73546736); bam (/tmp/na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam); basecalled_path (workspace/pass/); f5config (/app/data/config/fast5_path.config); f5folder (/tmp/na12878_loci/TGC_chr16_73546662-73546736); f5i (/tmp/na12878_loci/TGC_chr16_73546662-73546736/na.f5index); f5i_basefile (na); feature_num (50); label_size (4); merge_gap (4.5); mod_path (None); mod_version (2); multif5 (0); nb_size (3); nbsize (-1.5); outlog (0); outputfolder (/tmp/TGC_chr16_73546662-73546736); pcr (True); repeat (chr16:73546662-73546736:TGC:3); repeat_name (TGC_chr16_73546662-73546736); repeat_pat (TGC); rpg (/app/data/trf.v0.bed); summary_file (/tmp/na12878_loci/TGC_chr16_73546662-73546736/sequencing_summary.txt); Generating features: /app/scripts/genomic1FE /tmp/na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam /tmp/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs /tmp/na12878_loci/TGC_chr16_73546662-73546736/ /tmp/na12878_loci/TGC_chr16_73546662-73546736/na.f5index chr16:73546662-73546736:TGC:3 /app/data/trf.v0.bed /app/data/config/fast5_path.config -1.5 500 p_f5_conf_file /app/data/config/fast5_path.config Input = /tmp/na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam /tmp/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs Error! Cannot generate fs file: /tmp/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs

As you can see, it is pretty much the same as the previous one.

Is this problem solved? I'm having the same problem.

Please try to install glibc-source (please note that: does NOT install glibc especially with root permission, since the installation of glibc will crash your OS).

Hi @liuqianhn,

I can confirm that the solution suggested in Issue #6 and the subsequent changes to the environment.yml fixed the issue I was having. I can now run the software on the test data provided. I will test on the data that we have in our lab.

Thank you for your help!