WGLab/LinkedSV

[fai_load] build FASTA index.

Opened this issue · 4 comments

Hi, I finished running the pipeline on one sample, which called far fewer variants than longranger. This is a sample with low library quality so it might be correct. I am running it on a second sample but the pipeline is stuck on the following step for two days now.
[fai_load] build FASTA index.
I found fermikit is still running but the region it is working on has not been updated in two days. Is this what to expect? Is there a way to stop the pipeline and start from a checkpoint?

$ date
Thu Jan  9 16:37:01 CST 2020
$ ls -lt|head
total 4616
-rw-rw-r-- 1 hao hao       0 Jan  7 10:06 region_015738.all_hap.flt.fmd
-rw-rw-r-- 1 hao hao       0 Jan  7 10:06 region_015738.all_hap.flt.fmd.log
-rw-rw-r-- 1 hao hao  264096 Jan  7 10:06 region_015738.all_hap.flt.fq.gz
-rw-rw-r-- 1 hao hao     667 Jan  7 10:06 region_015738.all_hap.flt.fq.gz.log
-rw-rw-r-- 1 hao hao  396412 Jan  7 10:06 region_015738.all_hap.ec.fq.gz
-rw-rw-r-- 1 hao hao     591 Jan  7 10:06 region_015738.all_hap.ec.fq.gz.log
-rw-rw-r-- 1 hao hao    1173 Jan  7 10:06 region_015738.all_hap.mak
-rw-rw-r-- 1 hao hao  100056 Jan  7 10:06 region_015738.fasta.sa
-rw-rw-r-- 1 hao hao      11 Jan  7 10:06 region_015738.fasta.amb
$ cat /proc/cpuinfo |head
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz
stepping	: 4
microcode	: 0x2000064
cpu MHz		: 1309.928
cache size	: 8448 KB
physical id	: 0

Sorry for the late reply. Could you please show me your command? How many variants did LinkedSV and Longranger detect? For the fai loading error, please make sure your reference fasta file contains an index file(e.g. the fai file). If not, please use samtools faidx to create the index file.

Could you please upload the last 100 lines of the stderr and stdout so that I know more about the error?

Thank you!
Li

Li, Thanks for the reply. Here is the command I used:


export PATH=$PATH:/home/hao/apps/LinkedSV/bin/

bam=../acf_rat_bams/0502_bn_eve_phased_possorted_bam.bam
ref=../longranger_rn6_db/rn6_condensed.fa

threads=24

out=`basename $bam phased_possorted_bam.bam`linkedsv
echo "$out"
python /home/hao/apps/LinkedSV/linkedsv.py -i $bam -d $out -r $ref --ref_version rn6 --germline_mode -t $threads

It is stuck on this step:

[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.03 seconds elapse.
[bwa_index] Update BWT... 0.00 sec
[bwa_index] Pack forward-only FASTA... 0.00 sec
[bwa_index] Construct SA from BWT and Occ... 0.02 sec
[main] Version: 0.7.15-r1142-dirty
[main] CMD: /home/hao/apps/LinkedSV/scripts/../fermikit/fermi.kit/bwa index /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.fasta
[main] Real time: 0.340 sec; CPU: 0.065 sec
[01/07/2020 15:30:00 (389.095 MB)] cd /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897 && perl /home/hao/apps/LinkedSV/scripts/../fermikit/fermi.kit/fermi2.pl unitig -s 200000 -l 151 -t 1 -p /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all.fastq > /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.mak


[01/07/2020 15:30:00 (389.095 MB)] make -f /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.mak


bash -e -o pipefail -c '/home/hao/apps/LinkedSV/scripts/../fermikit/fermi.kit/bfc -s 200000 -t 1 <(cat /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all.fastq) <(cat /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all.fastq) 2> /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.ec.fq.gz.log | gzip -1 > /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.ec.fq.gz'
/home/hao/apps/LinkedSV/scripts/../fermikit/fermi.kit/bfc -1s 200000 -k 61 -t 1 /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.ec.fq.gz 2> /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.flt.fq.gz.log | gzip -1 > /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.flt.fq.gz
/home/hao/apps/LinkedSV/scripts/../fermikit/fermi.kit/ropebwt2 -dNCr /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.flt.fq.gz > /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.flt.fmd 2> /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.flt.fmd.log
/home/hao/apps/LinkedSV/scripts/../fermikit/fermi.kit/fermi2 assemble -l 71 -m 114 -t 1 /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.flt.fmd 2> /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.pre.gz.log | gzip -1 > /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.pre.gz
/home/hao/apps/LinkedSV/scripts/../fermikit/fermi.kit/fermi2 simplify -CSo 76 -m 114 -T 71 /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.pre.gz 2> /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.mag.gz.log | gzip -1 > /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.mag.gz
[01/07/2020 15:30:02 (389.095 MB)] cd /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897 && perl /home/hao/apps/LinkedSV/scripts/../fermikit/fermi.kit/run-calling -t 1 /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.fasta /home/hao/data_10TB/linkedsv/0502_bn_eve_linkedsv/region_015897/region_015897.all_hap.mag.gz | sh
[fai_load] build FASTA index.

In my previous run that finished, it detected very few SVs:

      3 0335_wli_phased_possorted_bam.bam.filtered_large_svcalls.bedpe
     25 0335_wli_phased_possorted_bam.bam.large_cnv.bedpe
  15827 0335_wli_phased_possorted_bam.bam.small_deletions.bedpe

In comparison, long ranger detected more:

   2780 large_sv_calls.bedpe
   6589 large_sv_candidates.bedpe

I want to add that the 2nd run was stuck on the build FASTA index step and samtools is in my PATH.

Thanks!

I have a similar error. For me, phased_possorted_bam.bam.large_cnv.bedpe and phased_possorted_bam.bam.filtered_large_svcalls.bedpe are always empty. I always get results for phased_possorted_bam.bam.small_deletions.bedpe. I've used linked reads simulated by LRSIM. Is there something which I don't consider?