Use all treatment and control

Question

Use all treatment and control

Closed this issue 6 years ago · 3 comments

To test the peak calling rules I have tested the bed branch on real samples, the good news is that the peak calling works well. There is still the issue with the indexing of the bam file here #8 .

The problem is that the pipeline only called peaks for ATAC1 vs ATAC4, while it should also have work for all treatment and control.

treatment= 'ATAC1', 'ATAC2', 'ATAC3'
control = 'ATAC4', 'ATAC5', 'ATAC6'.

I expect the files ATAC2 vs ATAC5 and ATAC3 vs ATAC6 as well.

Answer 1 · 2018-09-11T10:32:32.000Z

This is still an issue, I have tried to run the development branch on the genseq with real data from a list of 3 treatments and 3 controls, only the first elements of the lists have been compared.

I am know producing more 'sub' samples with seqtk to add them in the repository and make test to figure out what need to be changed.

Answer 2 · 2018-09-11T12:07:07.000Z

Changing the line 70, defining the BED_NARROW, seems to have an effect on the outputs by just removing the ZIP function.

BED_NARROW      =     expand(RESULT_DIR + "bed/{treatment}_vs_{control}_{unit}_peaks.narrowPeak", zip, treatment = CASES, control = CONTROLS,unit=UNITS)

BED_NARROW      =     expand(RESULT_DIR + "bed/{treatment}_vs_{control}_{unit}_peaks.narrowPeak", treatment = CASES, control = CONTROLS,unit=UNITS)

With the zip function, I got this : rule call_narrow_peaks only works once
Without the zip function, I got this : rule call_narrow_peaks works 9 times

The problem is that it then compare all treatment to all control, so it uses a lot of computational power for nothing.

Answer 3 · 2018-09-12T12:42:19.000Z

fixed with commit 71478a1 .
The problem was in the expand function calling for the outputs BED_NARROW and BAMCOMPARE, because of the presence of a the {unit} wildcard, which was calling for the list UNIT containing a single object. The zip() function was running only on 1 treatment_vs_control.

I removed all the {unit} wildcards in the Snakefile, removed the column unit from the units.tsv and changed the functions in the Snakefile for getting the fastq files.