ohlab/GRiD

allow to start from sam files?

housw opened this issue ยท 12 comments

housw commented

Hi,

this is a feature request instead of a bug report. I'm wondering would it be easy for you to make an option to allow starting from aligned sam or bam files? It might make sense since we usually have these files generated for other purposes as well.

Please advice, thanks in advance!

Best,
Shengwei

Hi Shengwei, @housw yes, that is doable especially for the "single mode".

However, for the "multiplex" mode, SAM/BAM files supplied would need to be derived from reads mapped to the GRiD database or a custom-generated GRiD database since downstream steps relies on the database format.

Regards,

  • Tunde
housw commented

Hi Tunde,

yes, I agree. It will be highly appreciated if you could implement this feature in the 'single' mode.

Thanks a lot.

Cheers,
Shengwei

Ok great. I'll do this in the coming days and release a new version

  • Tunde
housw commented

That's cool, looking forward to the next release! ๐Ÿ‘

Hi there,

I agree this would be very useful to scale GRiD for large analysis (lot of samples) since fastq files can be huge and a pain to archive or generate from archived files.

If starting from SAM/BAM is too much of a hassle to implement right now, something simpler that would definitively help would be to allow zipped files. I've looked at the code and it seems to me this would be possible without much change since the extension *.fastq is only used in bowtie2 there :
https://github.com/ohlab/GRiD/blob/master/grid#L349-L356
https://github.com/ohlab/GRiD/blob/master/grid#L501-L513
and bowtie2 allows the use of fastq.gz files directly.

A solution would thus be to simply add an option:
-r_ext Extension of the files containing reads (fastq, fq, fastq.gz, etc)

Cheers,
Nils

Hi Nils, totally agree. It will be easy to implement input choices of zipped fastq or SAM/BAM. I haven't been able to implement this due to other projects but hope to get to it during the weekend.

Thanks!
Tunde

@housw I just released a new version that accepts SAM files as input. Thanks to @nigiord who also expanded support for different input file extensions.

Cheers,
Tunde

housw commented

Hi Tunde,

that's awesome, thanks a lot for your hard work over the weekend. I'm going to test it with my data set and will keep you updated.

Cheers,
Shengwei

Thank you indeed! That's gonna save quite some time for analysis with lot of samples.

However, SAM inputs are only valid in the 'single' module.

Any technical limitations that impede the use of SAM inputs for the multiplex module? Is it also a problem for GRiD if the SAM inputs have been generated using paired-end reads?

Cheers,
Nils

In fact I've been trying to use GRiD 1.2 these last weeks on a subset of my data, and I happen to have a couple of technical questions and suggestions. I'll probably ask them elsewhere since this thread is focused on SAM inputs. Would you prefer a single issue containing all my points or an issue for each point?

@nigiord Its fine if the SAM inputs are generated from paired-end reads. I avoided SAM inputs for multiplex module since they have to be generated from reads mapped to the GRiD database.
Sure, you can open a single thread regarding your other suggestions

housw commented

Hi Tunde,

GRiD works great with my sam files, thanks a lot!

Cheers,
Shengwei

That is good to know.
Cheers

  • Tunde