epi2me-labs/wf-pore-c

BED file did not store the same information as BAM

Closed this issue · 10 comments

Operating System

CentOS 7

Other Linux

No response

Workflow Version

v1.1.0

Workflow Execution

EPI2ME Desktop application

EPI2ME Version

No response

CLI command run

nextflow run /home/106public/software/wf-pore-c --fastq A4Mpass.fq.gz --ref xxx.fasta --cutter NlaIII -c increase_memory.config --threads 60 --paired_end_minimum_distance 100 --paired_end_maximum_distance 200 --chunk_size 10000 --paired_end --bed

Workflow Execution - CLI Execution Profile

standard (default)

What happened?

I got a 700GB pair-end bam and 13 GB bed files. I used yahs for anchor but I get different results for bam and bed. I think the bed files did not stored the same information as bam files.

In addiation, .hic file produced by the BED file was too small (about 900MB when anchoring by yahs).

Relevant log output

no error log files

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

none

Hi, thanks for the feedback - we will look in to this. are you using the BAM from within the paired_end folder of the output?

I am not sure what you mean by the .hic file produced with the bed file? We generate a hic file from the workflow using the juicer tool.

Hi, thanks for the feedback - we will look in to this. are you using the BAM from within the paired_end folder of the output?

I am not sure what you mean by the .hic file produced with the bed file? We generate a hic file from the workflow using the juicer tool.

image

  1. Yes. I used BED files produced by add --bed parameter but get very bad anchoring results (see picture 1)

  2. Then I used pair-end bam for yahs anchoring and get different results. (see picture 2)

image

What I care is, it produced different results from different output file. And I think it is not recommanded to use the bed format file in yahs now.

We used juicerbox to correct the scaffloding results, and it need .assemble and .hic file produced by juicer pre. The .hic file was not produced by your pipeline (It is produced after yahs anchoring).

I do not think it is problem of yahs. I think it is fault in wf-pore-c. The output Bed format file may be incorrect?

I checked code and found "monomers.mm2.ns.bam" was used to create bed file.

Is this a single-end bam or a pair-end bam?

Ah ok thanks for the feedback, we will check it and update it. monomers.mm2.ns.bam - this is a single-end BAM.

Ah ok thanks for the feedback, we will check it and update it. monomers.mm2.ns.bam - this is a single-end BAM.

As I know, yahs could not access single end bam. This may lead to wrong results. Please check it.

Hi Dear Sarah,

I think I know what happened.

Most scafflding software only identified pair-end reads by /1 and /2. And your bed file did not contained it, which could lead to the terrible results.

Thanks, we will get this amended soon

Hi - this is now updated in the pre-release

Closing through lack of response and this is now in the tagged version.