DictReader seems to not be handling utf-encoding properly

Question

DictReader seems to not be handling utf-encoding properly

Opened this issue 2 years ago · 3 comments

Description of the bug

This test samplesheet seems to fail check_samplesheet.py. I removed the data and just provided the headers - see attached files below, but...

cat test.csv 
sample,fastq_1,fastq_2,seq_type

Command used and terminal output

with open("test.csv", "r") as in_handle:
        reader = csv.DictReader(in_handle, dialect=sniff_format(in_handle))
        # Validate the existence of the expected header columns.
        if not required_columns.issubset(reader.fieldnames):
            req_cols = ", ".join(required_columns)
            sys.exit(1)

reader.fieldnames
['\ufeffsample', 'fastq_1', 'fastq_2', 'seq_type']

Relevant files

test.csv

System information

No response

A quick fix is to read in the file and re-write it out with pandas, but thought I would report this.

Answer 1 · 2023-06-21T12:36:50.000Z

Thanks for reporting this and sorry for the late reply. We will fix this before the next release.

Answer 2 · 2023-06-26T12:13:36.000Z

Thank you Thomas for the quick fix indication, however, this approach is not working for me. Have you re-written the csv with the standard pd.to_csv and used default parameters? @thomasyu888

Answer 3 · 2023-07-10T21:06:48.000Z

Hi, @martinabetti-97 I did use the standard pd.to_csv with the default parameters. I forget which version of pandas I was using.