Quality trim MBD-BS data

Question

Quality trim MBD-BS data

Closed this issue a year ago · 20 comments

Answer 1 · 2022-10-31T21:54:02.000Z

Trimmed with Trim Galore using the slurm script trim-mbdbs.sh. Followed recommended settings for the Zymo Pico-Methyl kit, as per the Bismark developers. The script was written based on the MethCompare project's code.

Here's the MultiQC report for the trimmed data:

Answer 2 · 2022-11-03T01:01:24.000Z

@laurahspencer - Do you have the FastQC data from before trimming that we could glance at? Might be useful for refence in discussion in Issue #15 .

Answer 3 · 2022-11-03T01:09:02.000Z

Do you have the FastQC data from before trimming that we could glance at?

Straight from seq facility before anything was done to it....

Answer 4 · 2022-11-03T16:15:59.000Z

Here's the multiQC report on the data that has been concatenated by sample - multiqc_report_raw

I'll run fastqc/multiqc on the un-concatenated files now, and will get back to you

Answer 5 · 2022-11-03T16:56:28.000Z

Thanks - and what type of sequencing was this? paired-end 150bp?

Answer 6 · 2022-11-03T17:16:07.000Z

As per @kristamnichols and the sequence length distribution (see multiqc report), we believe one lane/run was 100bp, and the other was 150bp.

Answer 7 · 2022-11-03T17:21:59.000Z

Would want to concatenate after trimming. Lets have look at raw fastqc first.

Answer 8 · 2022-11-03T17:42:13.000Z

Would want to concatenate after trimming. Lets have look at raw fastqc first.

Why does it matter? - My understanding is that trimming/filtering works on each read separately, so it doesn't make a difference whether trimming occurs before/after concatenating

Answer 9 · 2022-11-03T17:45:34.000Z

Why does it matter?

Had the same thought. Concatenating is just adding lines of text to the end of an existing file, so shouldn't have an impact on anything downstream.

Answer 10 · 2022-11-03T17:52:20.000Z

Might not.
But two different read lengths? could be two "runs" thus batch effects? I would assess they are similar in MDS.

And how do we know the trim needs are not different if we have not seen fastqc on raw?

Answer 11 · 2022-11-03T17:57:45.000Z

What is the first rule of FISH546? :)

Answer 12 · 2022-11-03T18:00:48.000Z

But two different read lengths?

I was assuming the only concatenation taking place were samples from multiple lanes, the same sequencing parameters.

Certainly would refrain from concatenating a mix of runs with different sequencing params - not sure how downstream software handles FastQs with inconsistent read lengths.

Answer 13 · 2022-11-03T18:26:09.000Z

Yeah I assumed the same, until I saw the fastqc results, chatted with krista, and realized that i we have a mix of 100bp and 150bp reads. i'll plan to re-run the pipeline with trimming occurring prior to concatenating.

Answer 14 · 2022-11-03T18:34:22.000Z

Don't forget to post FastQC/MultiQC of raw, non-concatenated reads. 😃

Answer 15 · 2022-11-03T18:55:04.000Z

Yup- here you go -

Answer 16 · 2022-11-03T20:25:06.000Z

I am going to suggest only about 20bp are good in one batch and 40bp in the other....

Answer 17 · 2022-11-03T20:29:40.000Z

i'll plan to re-run the pipeline with trimming occurring prior to concatenating.

but run some PCAs / MDS before you concatenate.

Answer 18 · 2022-11-03T20:36:43.000Z

I am going to suggest only about 20bp are good in one batch and 40bp in the other....

I don't think I follow. Can you elaborate on what you mean by this?

Answer 19 · 2022-11-03T21:31:56.000Z

I am basing that on the fact these lines should essentially be horizontal - https://d.pr/i/mevTec

This is indicative of artificial sequences / adaptors

Answer 20 · 2022-11-03T21:36:49.000Z

Previous quality/trimming results show ~120bp are good, though.