clintval/sample-sheet

Same sample ID on multiple lanes causes error

Closed this issue · 9 comments

Hi, I have a sample sheet that has the same sample ID but on multiple lanes, which in our case can happen quite frequently. This case is currently not supported. Could this be added?

Example [Data] section:

[Data],,,,,,,
Lane,Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,Sample_Project,Description
1,WES013BL,,,,A010,TAGCTT,,
1,WES013FR,,,,A027,ATTCCT,,
1,MDx150891,,,,A012,CTTGTA,,
1,MDx150892,,,,A016,CCGTCC,,
2,WES013BL,,,,A010,TAGCTT,,
2,WES013FR,,,,A027,ATTCCT,,
2,MDx150891,,,,A012,CTTGTA,,
2,MDx150892,,,,A016,CCGTCC,,

Yes definitely. Thanks for submitting this issue. I will add this in for the next release.

Thanks! Great work!
Your lib saved me quite some coding!

Awesome to hear! It's saved me quite a bit of effort too.

Please keep the suggestions coming, I only use sample sheets for a very specific task and will need the community's help in making this useful across applications.

Sure, I will report if we have any other issues/suggestions.

@clintval , beware that according to Illumina's reference the Sample_ID must be a unique identifier.

At a minimum, the one column that is universally required is Sample_ID,
which provides a unique string identifier for each sample.

So I assume you should only allow duplicates if the Lane column is provided, and if its values are different for a given Sample_ID.

@reisingerf , in your case I am not sure the Lane column is necessary, as documentation from recent version of bcl2fastq mentions:

When the Lane column of the sample sheet Data section is populated, only those lanes are converted. When the Lane column is not used, all lanes are converted.

Except if you use it to extract data only for specific lanes from a larger flowcell, I guess.

Thanks @PertuyF. I recognize some Illumina sequencers may allow per lane loading which would support the notion that you could technically have identical Sample_ID on the same flowcell albeit on different lanes with the same sample indexes.

I am willing to be permissive on the specification instead of restrictive since sample sheets are used by more platforms than just Illumina (e.g.. 10x).

I am open to discussion on how permissive this library should be to restricting the import of sample sheets.

Let me dwell on this a bit and I will respond back.

@reisingerf, feel free to comment on your specific application and need for this feature. I am interested in the applications you are pursuing.

We are sequencing cancer samples with using Illumina's NovaSeq. We have a few reasons:
When you specify a lane in the sample sheet for the same sample ID, it generates SAMPLE_S([0-9]+)_L00[1-8]_R[1-2]_001.fastq.gz This helps to trace and identify the source of FASTQ back to the lane, (this info is also in the read header, but it's much easier to just look at the FASTQ file name).
In some cases it may be inevitable to specify lane number due to logistic restrictions when we need to load by lane, either because we don’t have enough indexes or we need to add up to the desired coverage.

Hi @reisingerf, I implemented the feature and made a new release as v0.4.0. Let me know how it works for you. I did demo your sample sheet snippet with success.

Feel free to update/install from PyPi:

$ pip install sample_sheet

Great thanks!
Works fine now for my use cases!
Great job!