nf-core/hlatyping

The example sample sheet for running with BAM file input seems incorrect

jowkar opened this issue · 5 comments

Description of the bug

I am having trouble writing a sample sheet that works for input with BAM files (v. 2.0.0). Looking at the example in the documentation, the following is supposed to be the format:

sample,fastq_1,fastq_2,bam,seq_type
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,,dna
CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz,,rna
TREATMENT_REP1,AEG588A4_S4_L003_R1_001.bam,,dna

Formatting my own data according to this format leads to a sample sheet validation error:

Error executing process > 'NFCORE_HLATYPING:HLATYPING:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)'

Caused by:
  Process `NFCORE_HLATYPING:HLATYPING:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)` terminated with an error exit status (1)

Command executed:

  check_samplesheet.py \
      samplesheet.csv \
      samplesheet.valid.csv

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_HLATYPING:HLATYPING:INPUT_CHECK:SAMPLESHEET_CHECK":
      python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
    File "/home/joakim/.nextflow/assets/nf-core/hlatyping/bin/check_samplesheet.py", line 36
      **kwargs,
              ^
  SyntaxError: invalid syntax

To test further, I tracked down the sample sheet validation script (check_samplesheet.py), pasted the example sample sheet data into a file (test.csv) and ran as shown in the "command used and terminal output" section below. Even this gave an error, so it seems that either the example sample sheet is incorrect or that there is a bug in the validation script. I noted that the number of commas in the last row of the sheet in one fewer than what one might expect and tried the obvious of adding one before ",,dna". This did not work. I also tried some other rearrangements, but the only thing that made it pass validation was to completely remove the bam file row, as well as the bam file column, and changing ",,dna" to ",dna" for the fastq rows. Unfortunately, for my own data I only have the bam files available, so being able to run with those would be ideal.

Command used and terminal output

python ~/.nextflow/assets/nf-core/hlatyping/bin/check_samplesheet.py test.csv test.valid.csv

Traceback (most recent call last):
  File "/home/joakim/.nextflow/assets/nf-core/hlatyping/bin/check_samplesheet.py", line 278, in <module>
    sys.exit(main())
  File "/home/joakim/.nextflow/assets/nf-core/hlatyping/bin/check_samplesheet.py", line 274, in main
    check_samplesheet(args.file_in, args.file_out)
  File "/home/joakim/.nextflow/assets/nf-core/hlatyping/bin/check_samplesheet.py", line 213, in check_samplesheet
    reader = csv.DictReader(in_handle, dialect=sniff_format(in_handle))
  File "/home/joakim/.nextflow/assets/nf-core/hlatyping/bin/check_samplesheet.py", line 172, in sniff_format
    if not sniffer.has_header(peek):
  File "/usr/lib64/python3.7/csv.py", line 394, in has_header
    rdr = reader(StringIO(sample), self.sniff(sample))
  File "/usr/lib64/python3.7/csv.py", line 188, in sniff
    raise Error("Could not determine delimiter")
_csv.Error: Could not determine delimiter

Relevant files

No response

System information

No response

Hi @jowkar, thanks for your report. This seems to be an error in the documentation. You can see a working example in the full test sample sheet which looks like the following:

sample,fastq_1,fastq_2,bam,seq_type
SAMPLE_FASTQ,https://raw.githubusercontent.com/nf-core/test-datasets/hlatyping/fastq/NA11995_SRR766010_1_fished.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/hlatyping/fastq/NA11995_SRR766010_2_fished.fastq.gz,,dna
SAMPLE_FASTQ_RNA,https://raw.githubusercontent.com/nf-core/test-datasets/hlatyping/rna/CRC_81_N_1_fished.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/hlatyping/rna/CRC_81_N_2_fished.fastq.gz,,rna
SAMPLE_BAM,,,https://raw.githubusercontent.com/nf-core/test-datasets/hlatyping/bam/example_pe.bam,dna

Basically the bam file just has to be provided in the bam column. Please let us know if you tried it like that and it still failed.

Hi @jowkar, any update on this?

What I have tried so far is modifying the example sheet according to your comment and running it through the validation script. It passes that validation. However, I briefly tried running my actual data after also modifying the actual sample sheet in the same way and it did not work, and gave a similar validation-related error message. I was intending to look into it more and make sure I didn't just miss something in that sheet, but I have been very busy the last week and have not had time. I'll get back to it as soon as possible though.

Ok. Have tried again now, and it seems that the following format works:

sample,fastq_1,fastq_2,bam,seq_type
Sample_1,,,/path/to/file.bam,dna

Thanks for the feedback. I will fix this in the docs and close this issue for now. Feel free to reopen it when you encounter further issues.