Aligner: Create a merged file when using folder as input
Closed this issue · 2 comments
Issue Report
Please describe the issue:
Is there a way to output a merged file when using a folder as input? Right now I'm using
dorado aligner --recursive (fasta reference) (read folder) > output.bam
which errors out:
[2024-11-15 12:09:23.137] [error] An output-dir must be specified if reading from an input folder.
[2024-11-15 12:09:23.137] [error] Could not initialise for input ../../globus_download/08302024_XKLT001_D18_RERUN_SV/
When I use -o for an input folder, it will create a ton of small bams corresponding to each file of the input folder (eg. what MinKNOW used to output before recent updates). Presumably this is so each file can be sorted, but is there a way to just create a single aligned output bam? I can sort it later -- it's just that merging a ton of bam files is really problematic with samtools when you have thousands of them (eg. I have to break it into subgroups).
Run environment:
- Dorado version: 0.8.3
- Operating system: Linux
- Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): unaligned bam from MinKNOW
@billytcl
At this time theres no way to generate a single output from multiple inputs directly in Dorado as you've found.
Kind regards,
Rich
@billytcl - as a workaround, just put a conversion step in between, pod5 convert fast5 <..>
in case of fast5 input or pod5 merge <..>
in case of pod5 input. It does not take much time, compared to actual basecalling/alignment. That's what I do, even for large datasets.
In the "Run Environment" you mention that source data is "unaligned bam from MinKNOW" .. so this is basecalled data. In this case you could use minimap2
directly, no need to go for dorado
route ..
E.g. like (samtools >= 1.16)
samtools fastq \
--threads $SAM_THREADS \
-T "*" \
$BAM_BCL \
| minimap2 \
--secondary=no \
-2 \
-a \
-y \
-t $MAP_THREADS \
-K $CHUNK \
-x $PRESET \
$REF_MMI \
- \
| samtools sort \
-m 2G \
--threads $SAM_THREADS \
-O BAM \
-o $BAM_ALN \
--write-index \
--reference $REF_FSA -