nf-core/bacass

samplesheet should have a suffix .tsv if it is a tab-separated file as described

antunderwood opened this issue · 8 comments

Description of the bug

The documentation describes that the input sample sheet is a tab-separated file but it labelled csv in the example. The pipeline fails if the suffix is .tsv

I suggest changing the example and the pattern match to ^\S+\.tsv$

Agreed, I'll put it on the list for the next release!

I was made aware that in https://nf-co.re/bacass/2.0.0/usage#samplesheet it is not specified that it must be tab-separated (but the example is fine), that needs changing as well.

Hi,

We are trying to run bacass pipeline 2.0.0 but it doesn't work ("argument of file function cannot be null" - I have no clue what it means). I'm pretty sure there is something wrong with the input samplesheet and that's how I found this conversation. However, I'm still confused which file type we should use, csv or tsv?
When we use .tsv, we get this error:
--input: string [ControlDSshorttab.tsv] does not match pattern ^\S+.csv$ (ControlDSshorttab.tsv)

Also, is it okay to list multiple fastq.gz files belonging to the same sample, or should we cat them before running the pipeline?

Many thanks for your answer and help in advance!

Hi,

here hopefully helpful answers:

I'm pretty sure there is something wrong with the input samplesheet and that's how I found this conversation. However, I'm still confused which file type we should use, csv or tsv?

tab-separated file with .csv as suffix, e.g. samplesheet.csv <- click so you can see the example

When we use .tsv, we get this error:
--input: string [ControlDSshorttab.tsv] does not match pattern ^\S+.csv$ (ControlDSshorttab.tsv)

Well, does not match pattern ^\S+.csv$ means the pipeline expects a file (technically here a string of characters ^\S+) ending ($) with a .csv, I understand that this is expressed in technical terms and it is be not generally understandable.

Also, is it okay to list multiple fastq.gz files belonging to the same sample, or should we cat them before running the pipeline?

cat them before, the pipeline does not support multiple files per entry.

Thanks for your quick help!

Now we have a different error message: "Read 1 FastQ file does not exist!". Of course, we don't have because we have Nanopore sequencing data, so we filled the R1 and R2 column with NAs. We only specified the LongFastQ column values. NAs used for GenomeSize and the Fast5 columns, too.

We even tried to specify the command --singleEnd, but still resulted in the same error message.

This is the command we try to run:
nextflow run nf-core/bacass --input samplesheet.csv -profile uppmax --project snicXXXX -bg --assembly_type 'long' --assembler 'miniasm' --skip_kraken2

Could you check whether your entry is exactly NA, no spaces, or something else additionally? You could also upload here the csv file if you'd like.

Oh yes, the "hidden" extra spaces were the issues... sorry and thanks!

Closing this, please open it again if considered.