nf-core/hlatyping

DictReader seems to not be handling utf-encoding properly

Opened this issue · 3 comments

Description of the bug

This test samplesheet seems to fail check_samplesheet.py. I removed the data and just provided the headers - see attached files below, but...

cat test.csv 
sample,fastq_1,fastq_2,seq_type

Command used and terminal output

with open("test.csv", "r") as in_handle:
        reader = csv.DictReader(in_handle, dialect=sniff_format(in_handle))
        # Validate the existence of the expected header columns.
        if not required_columns.issubset(reader.fieldnames):
            req_cols = ", ".join(required_columns)
            sys.exit(1)

reader.fieldnames
['\ufeffsample', 'fastq_1', 'fastq_2', 'seq_type']

Relevant files

test.csv

System information

No response

A quick fix is to read in the file and re-write it out with pandas, but thought I would report this.

Thanks for reporting this and sorry for the late reply. We will fix this before the next release.

Thank you Thomas for the quick fix indication, however, this approach is not working for me. Have you re-written the csv with the standard pd.to_csv and used default parameters? @thomasyu888

Hi, @martinabetti-97 I did use the standard pd.to_csv with the default parameters. I forget which version of pandas I was using.