question for constructing samplesheet
pseudacriscrucifer opened this issue · 2 comments
Hi,
How would you recommend I assemble a samplesheet for nf-core/atacseq with the following example of input fastq.gz files.
12_B9_KOL4052A19_S2_L001_I1_001.fastq.gz
12_B9_KOL4052A19_S2_L001_R1_001.fastq.gz
12_B9_KOL4052A19_S2_L001_R2_001.fastq.gz
12_B9_KOL4052A19_S2_L001_R3_001.fastq.gz
12_B9_KOL4052A19_S2_L002_I1_001.fastq.gz
12_B9_KOL4052A19_S2_L002_R1_001.fastq.gz
12_B9_KOL4052A19_S2_L002_R2_001.fastq.gz
12_B9_KOL4052A19_S2_L002_R3_001.fastq.gz
13_B10_KOL4052A20_S4_L001_I1_001.fastq.gz
13_B10_KOL4052A20_S4_L001_R1_001.fastq.gz
13_B10_KOL4052A20_S4_L001_R2_001.fastq.gz
13_B10_KOL4052A20_S4_L001_R3_001.fastq.gz
13_B10_KOL4052A20_S4_L002_I1_001.fastq.gz
13_B10_KOL4052A20_S4_L002_R1_001.fastq.gz
13_B10_KOL4052A20_S4_L002_R2_001.fastq.gz
13_B10_KOL4052A20_S4_L002_R3_001.fastq.gz
This represents sequencing files from two samples (12_B9 and 13_B10), and is paired-end. Any help would be appreciated - I have tried numerous constructs for samplesheet.csv!
Depending on whether Rx refers to biological or technical replicates, your samplesheet would look something like
sample,fastq_1,fastq_2,replicate
12_B9,12_B9_KOL4052A19_S2_L001_R1_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R1_001.fastq.gz,1
12_B9,12_B9_KOL4052A19_S2_L001_R2_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R2_001.fastq.gz,2
12_B9,12_B9_KOL4052A19_S2_L001_R3_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R3_001.fastq.gz,3
13_b10,13_B10_KOL4052A20_S4_L001_R1_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R1_001.fastq.gz,1
13_b10,13_B10_KOL4052A20_S4_L001_R2_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R2_001.fastq.gz,2
13_b10,13_B10_KOL4052A20_S4_L001_R3_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R3_001.fastq.gz,3
or
sample,fastq_1,fastq_2,replicate
12_B9,12_B9_KOL4052A19_S2_L001_R1_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R1_001.fastq.gz,1
12_B9,12_B9_KOL4052A19_S2_L001_R2_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R2_001.fastq.gz,1
12_B9,12_B9_KOL4052A19_S2_L001_R3_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R3_001.fastq.gz,1
13_b10,13_B10_KOL4052A20_S4_L001_R1_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R1_001.fastq.gz,1
13_b10,13_B10_KOL4052A20_S4_L001_R2_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R2_001.fastq.gz,1
13_b10,13_B10_KOL4052A20_S4_L001_R3_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R3_001.fastq.gz,1
In the latter case (technical replicates) the files will be merged after alignment but before any analysis.
Both examples miss still the *_I1_001.fastq.gz files. If these are more replicates, then add them accordingly. However, if they are input controls then use the --with_control
flag for running the pipeline and your samplesheet would rather look like
sample,fastq_1,fastq_2,replicate,control,control_replicate
12_B9,12_B9_KOL4052A19_S2_L001_R1_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R1_001.fastq.gz,1,12_B9_INPUT_CTRL,1
12_B9,12_B9_KOL4052A19_S2_L001_R2_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R2_001.fastq.gz,2,12_B9_INPUT_CTRL,1
12_B9,12_B9_KOL4052A19_S2_L001_R3_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R3_001.fastq.gz,3,12_B9_INPUT_CTRL,1
12_B9_INPUT_CTRL,12_B9_KOL4052A19_S2_L001_I1_001.fastq.gz,12_B9_KOL4052A19_S2_L002_I1_001.fastq.gz,1,,
13_b10,13_B10_KOL4052A20_S4_L001_R1_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R1_001.fastq.gz,1,13_b10_INPUT_CTRL,1
13_b10,13_B10_KOL4052A20_S4_L001_R2_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R2_001.fastq.gz,2,13_b10_INPUT_CTRL,1
13_b10,13_B10_KOL4052A20_S4_L001_R3_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R3_001.fastq.gz,3,13_b10_INPUT_CTRL,1
13_b10_INPUT_CTRL,13_B10_KOL4052A20_S4_L001_I1_001.fastq.gz,13_B10_KOL4052A20_S4_L002_I1_001.fastq.gz,1,,
As this issue had not more activity I will close it now. Feel free to reach us if you have any further questions @pseudacriscrucifer