a-ludi/dentist

Can gaps be filled with just extension reads?

lautimothy opened this issue · 5 comments

Hello,

Can Dentist be configured to fill gaps with just extension reads (the purple and orange ones in fig) ? (e.g. by setting "min-spanning-reads" to 0)
dentist_suppfig1 (From the DENTIST paper supplementary figure 1)

Thanks,
Tim

Hi Tim,

I am a little uncertain about your intentions. What are you trying to achieve?

Generally, you can enable partial gap closing – using the extending reads – by setting only: both. However, I strongly discourage this because the generated sequences are very prone to all sorts of errors. In case of a pure extension, min-reads-per-pile-up will determine the minimum number of extending reads required. This should not be any lower than the default 3. To reduce useless modifications, you may increase min-extension-length (default 100bp). See the list of CLI options for help on individual options.

In the default setting only: spanning only gaps that have support by spanning reads will be closed and all incident spanning and extending reads will be used to build a consensus. Is this what you were concerned with?

I just realized that the only option is rather poorly documented. You can find the valid values in the API docs.

Hi Arne,

Thanks for providing the relevant parameters. I am trying to fill gaps that might be longer than a typical pacbio read, and the assembly in question is assembled from long reads. To help minimise inaccuracies, I am running DENTIST with very specific inputs: only one gap and enriched reads for that gap. Information about that specific gap is provided from the long-read assembly graph, from which the adjacent contigs, approximate gap size (based on coverage), and repeating unit is known.

However, when I tried to test run DENTIST for just one artificial gap that is 50kbp long under only: both or only: extending, an error occurs (error files attached).

Thanks,
Tim

purged-output.log
dentist_run.e77671310.txt

Hi @lautimothy ,

sorry for the long wait. Please adjust the snakemake.yml config by setting/uncommenting:

no_purge_output: true

This will disable the part of the workflow that failed. The failing bit is part of the automatic validation and purging which is not designed to work with extensions, anyway.

Thanks for looking into it and responding, @a-ludi . Disabling purging did the trick.

Glad I could help. I consider this issue done for now.