DictReader seems to not be handling utf-encoding properly
Opened this issue · 3 comments
thomasyu888 commented
Description of the bug
This test samplesheet seems to fail check_samplesheet.py
. I removed the data and just provided the headers - see attached files below, but...
cat test.csv
sample,fastq_1,fastq_2,seq_type
Command used and terminal output
with open("test.csv", "r") as in_handle:
reader = csv.DictReader(in_handle, dialect=sniff_format(in_handle))
# Validate the existence of the expected header columns.
if not required_columns.issubset(reader.fieldnames):
req_cols = ", ".join(required_columns)
sys.exit(1)
reader.fieldnames
['\ufeffsample', 'fastq_1', 'fastq_2', 'seq_type']
Relevant files
System information
No response
A quick fix is to read in the file and re-write it out with pandas, but thought I would report this.
christopher-mohr commented
Thanks for reporting this and sorry for the late reply. We will fix this before the next release.
martinabetti-97 commented
Thank you Thomas for the quick fix indication, however, this approach is not working for me. Have you re-written the csv with the standard pd.to_csv and used default parameters? @thomasyu888
thomasyu888 commented
Hi, @martinabetti-97 I did use the standard pd.to_csv
with the default parameters. I forget which version of pandas I was using.