Obtain processed methylation data

Question

Obtain processed methylation data

Closed this issue 4 years ago · 11 comments

aarmey commented 4 years ago

aarmey commented 4 years ago

Yes.

Answer 1 · 2020-06-08T14:04:48.000Z

[CGfinal files] chromosome, dinucleotide, position, methylated counts, total counts, sample ID

Answer 2 · 2020-06-08T18:33:04.000Z

You can download the data here. DO NOT COMMIT THIS TO THE REPOSITORY.

Answer 3 · 2020-06-15T16:23:36.000Z

We should filter out sites that have fewer than 15 total reads. Then, sites should only be included if they meet this 15-read cutoff across all samples.

Answer 4 · 2020-06-15T16:24:01.000Z

Another possibility is merging sites into genomic regions, but we can think about this later.

https://github.com/NuttyLogic/METSIM_HMG_Code

Answer 5 · 2020-06-15T23:14:44.000Z

Should I remove reads with the following chromosome names?
11_gl000202_random
17_ctg5_hap1
17_gl000203_random
17_gl000204_random
17_gl000205_random
17_gl000206_random
19_gl000208_random
19_gl000209_random
1_gl000191_random
1_gl000192_random
21_gl000210_random
4_ctg9_hap1
4_gl000193_random
4_gl000194_random
6_apd_hap1
6_cox_hap2
6_dbb_hap3
6_mann_hap4
6_mcf_hap5
6_qbl_hap6
6_ssto_hap7
7_gl000195_random
8_gl000196_random
9_gl000198_random
9_gl000199_random
9_gl000200_random
9_gl000201_random
M

I assume the answer is to delete, but I wanted to ask as they seem to be shared across most/all patients and maybe have a more identifiable location.

Answer 6 · 2020-06-16T03:37:57.000Z

Also, when I generate the ratio of reads, should I round the end value/to what decimal place should I round?

Answer 7 · 2020-06-16T03:42:19.000Z

Why would you need to round?

Answer 8 · 2020-06-16T04:21:46.000Z

I think just to make the file smaller.

Answer 9 · 2020-06-16T13:15:22.000Z

You should keep it as counts in the file, or round to at least three decimal places.

Answer 10 · 2020-06-17T22:51:31.000Z

Also, for the above chromosomes, should I remove those?