cancerit/PCAP-core

Support for Illumina's FASTQ file format

ckandoth opened this issue · 3 comments

Hello! Thanks so much for this tool. Before we do a pull-request, I wanted to ask if you are open to merging in code that adds support for Illumina's FASTQ file format described here. If you would prefer to implement this, that's fine too. But we will be happy to do a PR if you confirm that you can merge it in. We'd prefer not to fork.

Hi,

Can you clarify, are you specifically meaning the file naming convention, or do you mean the cassava style record headers?

@SIM:1:FCX:1:15:6329:1045:GATTACT+GTCTTAAC 1:N:0:ATCCGA

We're certainly willing to accept PRs. If you could add very small example file (10 records) to t/data along with tests confirming they behave as expected that would help speed acceptance. IIRC bwa can interpret the elements and add it to the resulting SAM reads (but it may need a flag).

Thanks! Yes, the file naming convention. Currently, there is a requirement here that the paired FASTQs end with _1 and _2. However, all our FASTQs follow the Illumina naming scheme that looks like this:

SampleName_S1_L001_R1_001.fastq.gz
SampleName_S1_L001_R2_001.fastq.gz

# For example:
T2-N_IGO_07008_BU_3_S19_R1_001.fastq.gz
T2-N_IGO_07008_BU_3_S19_R2_001.fastq.gz

A PR is now at #56. I'll close this issue.