laurahspencer/DuMOAR

Quality trim MBD-BS data

Closed this issue · 20 comments

sr320 commented
Quality trim MBD-BS data

Trimmed with Trim Galore using the slurm script trim-mbdbs.sh. Followed recommended settings for the Zymo Pico-Methyl kit, as per the Bismark developers. The script was written based on the MethCompare project's code.

Here's the MultiQC report for the trimmed data:

kubu4 commented

@laurahspencer - Do you have the FastQC data from before trimming that we could glance at? Might be useful for refence in discussion in Issue #15 .

sr320 commented

Do you have the FastQC data from before trimming that we could glance at?

Straight from seq facility before anything was done to it....

Here's the multiQC report on the data that has been concatenated by sample - multiqc_report_raw

I'll run fastqc/multiqc on the un-concatenated files now, and will get back to you

sr320 commented

Thanks - and what type of sequencing was this? paired-end 150bp?

As per @kristamnichols and the sequence length distribution (see multiqc report), we believe one lane/run was 100bp, and the other was 150bp.

sr320 commented

Would want to concatenate after trimming. Lets have look at raw fastqc first.

Would want to concatenate after trimming. Lets have look at raw fastqc first.

Why does it matter? - My understanding is that trimming/filtering works on each read separately, so it doesn't make a difference whether trimming occurs before/after concatenating

kubu4 commented

Why does it matter?

Had the same thought. Concatenating is just adding lines of text to the end of an existing file, so shouldn't have an impact on anything downstream.

sr320 commented

Might not.
But two different read lengths? could be two "runs" thus batch effects? I would assess they are similar in MDS.

And how do we know the trim needs are not different if we have not seen fastqc on raw?

sr320 commented

What is the first rule of FISH546? :)

kubu4 commented

But two different read lengths?

I was assuming the only concatenation taking place were samples from multiple lanes, the same sequencing parameters.

Certainly would refrain from concatenating a mix of runs with different sequencing params - not sure how downstream software handles FastQs with inconsistent read lengths.

Yeah I assumed the same, until I saw the fastqc results, chatted with krista, and realized that i we have a mix of 100bp and 150bp reads. i'll plan to re-run the pipeline with trimming occurring prior to concatenating.

kubu4 commented

Don't forget to post FastQC/MultiQC of raw, non-concatenated reads. 😃

sr320 commented

I am going to suggest only about 20bp are good in one batch and 40bp in the other....

sr320 commented

i'll plan to re-run the pipeline with trimming occurring prior to concatenating.

but run some PCAs / MDS before you concatenate.

kubu4 commented

I am going to suggest only about 20bp are good in one batch and 40bp in the other....

I don't think I follow. Can you elaborate on what you mean by this?

sr320 commented

I am basing that on the fact these lines should essentially be horizontal - https://d.pr/i/mevTec

This is indicative of artificial sequences / adaptors

kubu4 commented

Previous quality/trimming results show ~120bp are good, though.