title | date | output | editor_options | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Annotating DNA methylation array |
07/08/2019 |
|
|
This repository contains code for generating an annotation for the Illumina EPIC methylation array.
There are two annotation files, one mapped to hg19 (hg19_epic_annotation.rds
) and one mapped to hg38/GR38 (hg38_epic_annotation.rds
). They have the same annotation information (columns), but the hg38 annotation is missing 237 probes, since some mappings are lost from converting from hg19 to hg38.
Currently these files are not git tracked because they are too large (~250 mb).
I started with the default annotations provided by Illumina. I used two files, the latest b4 annotation (MethylationEPIC_v-1-0_B4.csv), and the list of probes that are missing between b3 and b2 (MethylationEPIC Missing Legacy CpG (v1.0_B3 vs. v1.0_B2) Annotations.csv). Both can be found on the product files list from Illumina's website.
Using the intersection of these two lists of probes, I used the provided genomic location (chromomsome and position) to map annotations to each cpg. Note that Illumina's provided annotations are based on hg19.
an example of the starting coordinates from Illumina that this annotation is based on
cpg | chr | start |
---|---|---|
cg00000029 | chr16 | 53468112 |
cg00000103 | chr4 | 73470186 |
cg00000109 | chr3 | 171916037 |
cg00000155 | chr7 | 2590565 |
cg00000158 | chr9 | 95010555 |
cg00000165 | chr1 | 91194674 |
cg00000221 | chr17 | 54534248 |
cg00000236 | chr8 | 42263294 |
cg00000289 | chr14 | 69341139 |
cg00000292 | chr16 | 28890100 |
I also kept some probe-specific information that I thought some may find useful. The columns for these variables are all prefixed with "ilmn_".
cpg | chr | start |
---|---|---|
cg00000029 | chr16 | 53468112 |
cg00000103 | chr4 | 73470186 |
cg00000109 | chr3 | 171916037 |
cg00000155 | chr7 | 2590565 |
cg00000158 | chr9 | 95010555 |
cg00000165 | chr1 | 91194674 |
cg00000221 | chr17 | 54534248 |
cg00000236 | chr8 | 42263294 |
cg00000289 | chr14 | 69341139 |
cg00000292 | chr16 | 28890100 |
I used the R package annotatr
to access UCSC annotations for cpg islands / transcripts, and FANTOM5 for enhancers.
UCSC transcript and cpg island -related elements:
cpg | chr | start | cpg_id | cpg_width | genes_id | genes_symbol | genes_tx_id | genes_width |
---|---|---|---|---|---|---|---|---|
cg00000029 | chr16 | 53468112 | shore | 2000 | promoter, 1to5kb | RBL2, RBL2 | uc002ehi.4, uc010vgv.1 | 1000, 4000 |
cg00000103 | chr4 | 73470186 | sea | 491623 | intergenic | NA | NA | 480899 |
cg00000109 | chr3 | 171916037 | sea | 398648 | intron, intron, intron | FNDC3B, FNDC3B, FNDC3B | uc003fhy.3, uc003fhz.4, uc003fia.3 | 93324, 93324, 93324 |
cg00000155 | chr7 | 2590565 | sea | 3182 | intron, intron | BRAT1, BRAT1 | uc003smi.3, uc003smj.2 | 6826, 6826 |
cg00000158 | chr9 | 95010555 | sea | 143935 | intron, intron, intron, intron, intron | IARS, IARS, IARS, IARS, IARS | uc004ars.2, uc004art.2, uc004aru.4, uc010mqr.3, uc010mqt.2 | 2306, 2306, 2306, 2306, 2306 |
cg00000165 | chr1 | 91194674 | shore | 2000 | intergenic | NA | NA | 107309 |
cg00000221 | chr17 | 54534248 | sea | 656815 | exon, intronexonboundary | ANKFN1, ANKFN1 | uc002iun.1, uc002iun.1 | 100, 400 |
cg00000236 | chr8 | 42263294 | sea | 10587 | exon, exon, exon, 3UTR, 3UTR | VDAC3, VDAC3, VDAC3, VDAC3, VDAC3 | uc003xpc.3, uc031tay.1, uc022aul.1, uc003xpc.3, uc022aul.1 | 567, 567, 567, 475, 475 |
cg00000289 | chr14 | 69341139 | shore | 2000 | exon, exon, exon, exon, exon, 3UTR, 3UTR, 3UTR, 3UTR, 3UTR | ACTN1, ACTN1, ACTN1, ACTN1, ACTN1, ACTN1, ACTN1, ACTN1, ACTN1, ACTN1 | uc001xkk.3, uc010ttb.2, uc001xkl.3, uc001xkm.3, uc001xkn.3, uc001xkk.3, uc010ttb.2, uc001xkl.3, uc001xkm.3, uc001xkn.3 | 895, 895, 895, 895, 895, 736, 736, 736, 736, 736 |
cg00000292 | chr16 | 28890100 | shore | 2000 | 1to5kb, exon, exon, intron | ATP2A1, ATP2A1, ATP2A1, no_associated_gene | uc002drp.1, uc002drn.1, uc002dro.1, uc010vct.2 | 4000, 302, 302, 931314 |
Enhancers
cpg | chr | start | enhancers_id | enhancers_width |
---|---|---|---|---|
cg00000776 | chr4 | 156388205 | enhancer | 116 |
cg00003578 | chr1 | 12600529 | enhancer | 328 |
cg00004667 | chr1 | 16292746 | enhancer | 536 |
cg00004963 | chr6 | 147124996 | enhancer | 324 |
cg00005325 | chr1 | 201684967 | enhancer | 354 |
cg00005461 | chr3 | 46131480 | enhancer | 363 |
cg00007021 | chr8 | 101819246 | enhancer | 437 |
cg00007969 | chr1 | 41633437 | enhancer | 488 |
cg00009088 | chr11 | 60930188 | enhancer | 335 |
cg00009585 | chr15 | 33111077 | enhancer | 345 |
Taken from the primary article.
cpg | chr | start | pmd_width | pmd_id |
---|---|---|---|---|
cg00000103 | chr4 | 73470186 | 332252 | chr4:73435322-73767574 |
cg00000165 | chr1 | 91194674 | 81136 | chr1:91192805-91273941 |
cg00000363 | chr1 | 230560793 | 68156 | chr1:230492946-230561102 |
cg00000596 | chr8 | 133098502 | 77607 | chr8:133063957-133141564 |
cg00000776 | chr4 | 156388205 | 162183 | chr4:156298095-156460278 |
cg00000884 | chr4 | 154609857 | 74720 | chr4:154606053-154680773 |
cg00000974 | chr20 | 6750606 | 1147 | chr20:6749547-6750694 |
cg00001099 | chr8 | 87081553 | 201811 | chr8:86879841-87081652 |
cg00001249 | chr14 | 60389786 | 171588 | chr14:60386751-60558339 |
cg00001520 | chr14 | 37666489 | 24805 | chr14:37641880-37666685 |
These placental imprinted regions were collected from several sources. The merging of these regions into a combined resource is documented at github.com/wvictor14/human_methylation_imprints.
cpg | chr | start | imprint_tissue_specificity | imprint_methylated_allele | imprint_sources | imprint_region |
---|---|---|---|---|---|---|
cg00000924 | chr11 | 2720463 | other | M | Court 2014, Hanna 2016 | 11:2719948-2722440 |
cg00050654 | chr4 | 4576493 | placental-specific | M | Sanchez-Delgado 2016 | 4:4576220-4577911 |
cg00059930 | chr13 | 48894382 | other | M | Court 2014 | 13:48892341-48895763 |
cg00082664 | chr4 | 154710796 | placental-specific | M | Sanchez-Delgado 2016, Hamada 2016 | 4:154709200-154715220 |
cg00082664 | chr4 | 154710796 | placental-specific | M | Sanchez-Delgado 2016, Hamada 2016 | 4:154709200-154715220 |
cg00083059 | chr6 | 39902348 | placental-specific | M | Hanna 2016 | 6:39901897-39902693 |
cg00096536 | chr4 | 154711906 | placental-specific | M | Sanchez-Delgado 2016, Hamada 2016 | 4:154709200-154715220 |
cg00096536 | chr4 | 154711906 | placental-specific | M | Sanchez-Delgado 2016, Hamada 2016 | 4:154709200-154715220 |
cg00098799 | chr15 | 99409360 | other | M | Court 2014 | 15:99408496-99409650 |
cg00155882 | chr8 | 141110747 | other | M | Court 2014, Hanna 2016 | 8:141107717-141111081 |
Lastly I mapped the annotation to the genome assembly hg38 using UCSC liftover's tool implemented in R. This results in a loss of 237 cpgs.