Question about normalisation

Question

Question about normalisation

Closed this issue 6 years ago · 2 comments

Hello,

Just needed a small clarification regarding normalising the counts generated by repenrich. The tutorial mentions that library size should calculated as reads processed - reads that failed to align using the bowtie log. I am a little confused about how to calculate this. Here is the bowtie result for one of my sample:

26562723 reads; of these:
26562723 (100.00%) were paired; of these:
7376294 (27.77%) aligned concordantly 0 times
7786417 (29.31%) aligned concordantly exactly 1 time
11400012 (42.92%) aligned concordantly >1 times
----
7376294 pairs aligned concordantly 0 times; of these:
1005276 (13.63%) aligned discordantly 1 time
----
6371018 pairs aligned 0 times concordantly or discordantly; of these:
12742036 mates make up the pairs; of these:
7152611 (56.13%) aligned 0 times
2669705 (20.95%) aligned exactly 1 time
2919720 (22.91%) aligned >1 times
86.54% overall alignment rate

Since the total alignment rate is 86.54% I am assuming I would use 0.8654*26562723=22987380 as the library size. Am I correct?

Thanks

Answer 1 · 2019-03-12T17:07:34.000Z

Hi there,

Thanks for your interest in our software! Yes that should be correct for the library size - I also wanted to comment on your previous post before you closed it, but you should probably run each file with a separate output destination if you are running jobs in parallel because (as you said) the outputs might conflict in some way within in the pair_1 and pair_2 folders.

Best,
Nick

Answer 2 · 2019-03-12T17:27:03.000Z

Great. Thanks a lot.