Error parsing samplesheet
Closed this issue · 1 comments
golharam commented
I have a sample sheet that looks like this:
[Header]
IEMFileVersion,4
Date,11/16/2015
Workflow,GenerateFASTQ
Application,RNA-Seq
Assay,TruSeq LT
Description
Chemistry,Default
[Reads]
75
75
[Settings]
Adapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
AdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
[Data]
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,GenomeFolder,Sample_Project,Description
MiSeq_L20151106_A1,MiSeq_L20151106_A1,,,AR001,ATCACG,Homo_sapiens\UCSC\hg19\Sequence\WholeGenomeFasta,QC,GeneexperessionQC
MiSeq_L20151106_B1,MiSeq_L20151106_B1,,,AR003,TTAGGC,Homo_sapiens\UCSC\hg19\Sequence\WholeGenomeFasta,QC,GeneexperessionQC
When reading this file using:
ss = IlluminaSampleSheet('SampleSheet.csv')
results in
$ python test.py
Traceback (most recent call last):
File "test.py", line 5, in <module>
ss = IlluminaSampleSheet(sample_sheet_path)
File "/Users/golharr/workspace/.venv/lib/python3.6/site-packages/sample_sheet/__init__.py", line 419, in __init__
self._parse(self.path)
File "/Users/golharr/workspace/.venv/lib/python3.6/site-packages/sample_sheet/__init__.py", line 537, in _parse
key, value, *_ = line
ValueError: not enough values to unpack (expected at least 2, got 1)
Look like the trailing *_ is causing the problem on 537. If you remove that, the key, value gets read correctly. The *_ is not used and hence no need to include it here.
After making the change to line 537, the same problem arises for the Description line, since there is no comma, only a key is provided, and no corresponding value. I think a better check would be to execute these lines is if len(line) >= 2. I'l submit a PR that works for me.