allow to start from sam files?
housw opened this issue ยท 12 comments
Hi,
this is a feature request instead of a bug report. I'm wondering would it be easy for you to make an option to allow starting from aligned sam or bam files? It might make sense since we usually have these files generated for other purposes as well.
Please advice, thanks in advance!
Best,
Shengwei
Hi Shengwei, @housw yes, that is doable especially for the "single mode".
However, for the "multiplex" mode, SAM/BAM files supplied would need to be derived from reads mapped to the GRiD database or a custom-generated GRiD database since downstream steps relies on the database format.
Regards,
- Tunde
Hi Tunde,
yes, I agree. It will be highly appreciated if you could implement this feature in the 'single' mode.
Thanks a lot.
Cheers,
Shengwei
Ok great. I'll do this in the coming days and release a new version
- Tunde
That's cool, looking forward to the next release! ๐
Hi there,
I agree this would be very useful to scale GRiD for large analysis (lot of samples) since fastq files can be huge and a pain to archive or generate from archived files.
If starting from SAM/BAM is too much of a hassle to implement right now, something simpler that would definitively help would be to allow zipped files. I've looked at the code and it seems to me this would be possible without much change since the extension *.fastq
is only used in bowtie2 there :
https://github.com/ohlab/GRiD/blob/master/grid#L349-L356
https://github.com/ohlab/GRiD/blob/master/grid#L501-L513
and bowtie2 allows the use of fastq.gz
files directly.
A solution would thus be to simply add an option:
-r_ext Extension of the files containing reads (fastq, fq, fastq.gz, etc)
Cheers,
Nils
Hi Nils, totally agree. It will be easy to implement input choices of zipped fastq or SAM/BAM. I haven't been able to implement this due to other projects but hope to get to it during the weekend.
Thanks!
Tunde
Hi Tunde,
that's awesome, thanks a lot for your hard work over the weekend. I'm going to test it with my data set and will keep you updated.
Cheers,
Shengwei
Thank you indeed! That's gonna save quite some time for analysis with lot of samples.
However, SAM inputs are only valid in the 'single' module.
Any technical limitations that impede the use of SAM inputs for the multiplex module? Is it also a problem for GRiD if the SAM inputs have been generated using paired-end reads?
Cheers,
Nils
In fact I've been trying to use GRiD 1.2 these last weeks on a subset of my data, and I happen to have a couple of technical questions and suggestions. I'll probably ask them elsewhere since this thread is focused on SAM inputs. Would you prefer a single issue containing all my points or an issue for each point?
@nigiord Its fine if the SAM inputs are generated from paired-end reads. I avoided SAM inputs for multiplex module since they have to be generated from reads mapped to the GRiD database.
Sure, you can open a single thread regarding your other suggestions
Hi Tunde,
GRiD
works great with my sam files, thanks a lot!
Cheers,
Shengwei
That is good to know.
Cheers
- Tunde