Quality trim MBD-BS data
Closed this issue · 20 comments
Trimmed with Trim Galore using the slurm script trim-mbdbs.sh. Followed recommended settings for the Zymo Pico-Methyl kit, as per the Bismark developers. The script was written based on the MethCompare project's code.
Here's the MultiQC report for the trimmed data:
@laurahspencer - Do you have the FastQC data from before trimming that we could glance at? Might be useful for refence in discussion in Issue #15 .
Do you have the FastQC data from before trimming that we could glance at?
Straight from seq facility before anything was done to it....
Here's the multiQC report on the data that has been concatenated by sample - multiqc_report_raw
I'll run fastqc/multiqc on the un-concatenated files now, and will get back to you
Thanks - and what type of sequencing was this? paired-end 150bp?
As per @kristamnichols and the sequence length distribution (see multiqc report), we believe one lane/run was 100bp, and the other was 150bp.
Would want to concatenate after trimming. Lets have look at raw fastqc first.
Would want to concatenate after trimming. Lets have look at raw fastqc first.
Why does it matter? - My understanding is that trimming/filtering works on each read separately, so it doesn't make a difference whether trimming occurs before/after concatenating
Why does it matter?
Had the same thought. Concatenating is just adding lines of text to the end of an existing file, so shouldn't have an impact on anything downstream.
Might not.
But two different read lengths? could be two "runs" thus batch effects? I would assess they are similar in MDS.
And how do we know the trim needs are not different if we have not seen fastqc on raw?
What is the first rule of FISH546? :)
But two different read lengths?
I was assuming the only concatenation taking place were samples from multiple lanes, the same sequencing parameters.
Certainly would refrain from concatenating a mix of runs with different sequencing params - not sure how downstream software handles FastQs with inconsistent read lengths.
Yeah I assumed the same, until I saw the fastqc results, chatted with krista, and realized that i we have a mix of 100bp and 150bp reads. i'll plan to re-run the pipeline with trimming occurring prior to concatenating.
Don't forget to post FastQC/MultiQC of raw, non-concatenated reads. 😃
I am going to suggest only about 20bp are good in one batch and 40bp in the other....
i'll plan to re-run the pipeline with trimming occurring prior to concatenating.
but run some PCAs / MDS before you concatenate.
I am going to suggest only about 20bp are good in one batch and 40bp in the other....
I don't think I follow. Can you elaborate on what you mean by this?
I am basing that on the fact these lines should essentially be horizontal - https://d.pr/i/mevTec
This is indicative of artificial sequences / adaptors
Previous quality/trimming results show ~120bp are good, though.