[Input samplesheet validation] Sample names with point can result in accidental sample merge

Question

[Input samplesheet validation] Sample names with point can result in accidental sample merge

TimHHH opened this issue 3 years ago · 5 comments

Given two samples with the same name before a point can result in an unwanted sample merge, e.g.:
Votintseva2017.614406.m
Votintseva2017.614406.1
In this case both different samples are interpreted as one, namely Votintseva2017.614406
Hence we should make a note in the manual that points are not allowed in the sample sheet.

Answer 1 · 2022-08-01T10:40:21.000Z

I'm thinking about addressing this in a general sense of creating a samplesheet validation logic?

A python script would be triggered to validate the samplesheet with all the nuances - what do you think?

CC @LennertVerboven

Answer 2 · 2022-08-30T07:37:03.000Z

Notes from meeting on 30-Aug-2022

Sample names (i.e. the sample column) should not have dots (non-dash symbols). Add a list of symbols not allowed.
All fields have a value (no empty columns)
- Remove the assumptions from https://github.com/TORCH-Consortium/xbs-nf/blob/436c515a1aa1cd6773f449a25d18ff3f6a962aa8/main.nf#L22
- Exit if samplesheet validation fails
Checks for dots in reference genome names (SNPEFF / default_configs)
TODO: Evaluate quoted strings in the samplesheet
Read-1 should be different from Read-2
(OPTIONAL) Both of these files should exist

@TimHHH @LennertVerboven please feel free to add other validations.

Answer 3 · 2022-09-01T08:43:25.000Z

This one Checks for dots in reference genome names (SNPEFF / default_configs) can be dropped. Our pipeline is not designed for using other reference genomes because of downstream process that require H37Rv. However, modifying XBS-nf to run with a different reference genome is certainly doable for those with a programming background.

Another requirement: no two rows should exist with exactly the same Study Sample Library Attempt. (at least the attempt number should differ)

Answer 4 · 2022-09-13T07:47:04.000Z

The initial effort has been done by @LennertVerboven and added here https://github.com/TORCH-Consortium/xbs-nf/blob/master/bin/sample_sheet_validation.py

Answer 5 · 2022-12-05T14:08:51.000Z

TODO: @abhi18av Need to add another check for any duplicates in the samplesheet.