Data preprocessed following the steps explained in my Msc. thesis project.
Files are saved as .rds data objects under './data/processed/humanBrainMicroarray/' folder.
Exploratory data analysis results are saved under './results/humanBrainMicroarray/' folder.
In total there are 26 sub-datasets from 7 different sources (labs / experiments). In total 28237 genes are detected at least one dataset. The number of genes detected across all datasets is 5677. The number of genes in each dataset is given in ./results/humanBrainMicroarray/numberofgenes.pdf. The distribution of number of datasets sharing each gene is given in ./results/humanBrainMicroarray/detectedgenedistribution.pdf.
Next we searched for the genes which show expression changes in the same direction across all datasets. Depending on how many datasets a given gene is detected, a distribution of number of datasets that show the same direction of change is generated - ./results/humanBrainMicroarray/genesharednessdistribution.pdf.
We compiled the genes showing consistent expression change with age across all datasets.
Up | Down | Background |
---|---|---|
100 | 117 | 5677 |
- The list of up and down regulated gene id s are converted to affymetrix hg u133a ids, using getBM function in biomaRt1 R2 package. This step is required by the CMap web service.
- If there are some probeset ids shared between up and down regulated genes, these are discarded.
- The list of up and down regulated genes are uploaded to the query system in CMap website with the name 'humanBrainMicroarray'.
- The result is downloaded as '.xlsx' file and converted to '.csv' files.
- The '.csv' files are read into R in order to correct the p values for multiple testing.
rank | cmap.name | mean | n | enrichment | p | specificity | percent.non.null | padj |
---|---|---|---|---|---|---|---|---|
1 | geldanamycin | -0.453 | 15 | -0.641 | 0 | 0.0234 | 80 | 0.000000000 |
2 | 15-delta prostaglandin J2 | -0.380 | 15 | -0.596 | 0 | 0.0451 | 60 | 0.000000000 |
3 | tretinoin | 0.420 | 22 | 0.553 | 0 | 0 | 77 | 0.000000000 |
4 | LY-294002 | 0.380 | 61 | 0.465 | 0 | 0.0872 | 72 | 0.000000000 |
5 | quinostatin | 0.861 | 2 | 0.995 | 0.00002 | 0.0055 | 100 | 0.001904000 |
6 | sirolimus | 0.282 | 44 | 0.345 | 0.00004 | 0.2048 | 52 | 0.003173333 |
7 | thioridazine | 0.354 | 20 | 0.525 | 0.00006 | 0.2055 | 75 | 0.004080000 |
8 | wortmannin | 0.288 | 18 | 0.479 | 0.00024 | 0.1548 | 55 | 0.014280000 |
9 | trifluoperazine | 0.315 | 16 | 0.501 | 0.00028 | 0.1971 | 75 | 0.014808889 |
10 | securinine | -0.646 | 4 | -0.884 | 0.00042 | 0 | 100 | 0.019992000 |
11 | emetine | 0.523 | 4 | 0.844 | 0.00088 | 0.0845 | 100 | 0.037286667 |
12 | atropine oxide | -0.346 | 5 | -0.779 | 0.00094 | 0 | 80 | 0.037286667 |
13 | camptothecin | 0.633 | 3 | 0.920 | 0.00112 | 0.1563 | 100 | 0.041009231 |
Since Lu2004 dataset decreases the number of detected genes dramatically, we repeated the analysis after removing it, to see if it creates a bias.
In total there are 25 sub-datasets from 6 different sources (labs / experiments). In total 28200 genes are detected at least one dataset. The number of genes detected across all datasets is 10893.
Up | Down | Background |
---|---|---|
192 | 202 | 10893 |
- The list of up and down regulated gene id s are converted to affymetrix hg u133a ids, using getBM function in biomaRt1 R2 package. This step is required by the CMap web service.
- If there are some probeset ids shared between up and down regulated genes, these are discarded.
- The list of up and down regulated genes are uploaded to the query system in CMap website with the name 'humanBrainMicroarray_noLu'.
- The result is downloaded as '.xlsx' file and converted to '.csv' files.
- The '.csv' files are read into R in order to correct the p values for multiple testing.
rank | cmap.name | mean | n | enrichment | p | specificity | percent.non.null | padj |
---|---|---|---|---|---|---|---|---|
1 | 15-delta prostaglandin J2 | -0.391 | 15 | -0.584 | 0 | 0.0526 | 66 | 0.000000 |
2 | wortmannin | 0.376 | 18 | 0.570 | 0 | 0.0323 | 66 | 0.000000 |
3 | LY-294002 | 0.425 | 61 | 0.537 | 0 | 0.0336 | 77 | 0.000000 |
4 | sirolimus | 0.369 | 44 | 0.421 | 0 | 0.1265 | 65 | 0.000000 |
5 | tretinoin | 0.347 | 22 | 0.472 | 0.00006 | 0.0398 | 63 | 0.005472 |
6 | thioridazine | 0.334 | 20 | 0.469 | 0.00016 | 0.3014 | 70 | 0.012160 |
Data is preprocessed following the steps explained in my GTEx repository.
Among all tissues, only the ones having at least 20 subjects are considered. We also excluded 'Cells-Transformedfibroblasts' category. As a result 35 tissues*1 (17 major tissue type) are used for the downstream analysis.
Files are saved as .rds data objects under './data/processed/GTEx/' folder.
Exploratory data analysis results are saved under './results/GTEx/' folder.
There was only 1 gene that has a consistent gene expression change with age: ENSG00000269834 (ZNF528 antisense RNA 1)
Since there is also not enough number of genes of which expression values are significantly*2 changing with age, we resorted to another approach:
- Genes that are not expressed in all tissues are excluded from the downstream analysis (19064 genes left).
- For each gene, we calculated the number of datasets having a positive correlation with age, irrespective of the effect size.
- Using that distribution we considered a gene as a consistently changing gene if it is in the lower 0.5% (Decreasing expression according to 31 datasets) or upper 0.5% (Increasing expression according to 32 datasets).
- As a result we have 112 up and 104 down regulated genes.
Next, we asked 'How many of these up and down regulated genes are the ones we determined using Brain ageing microarray data?'
GTEx-up | GTEx-down | Brain_micro-up | Brain_micro-down | |
---|---|---|---|---|
GTEx-up | X | 0 | 4% | 0 |
GTEx-down | 0 | X | 0 | 3.4% |
Brain_micro-up | 3.6% | 0 | X | 0 |
Brain_micro-down | 0 | 3.8% | 0 | X |
Since the resulting numbers are quite low, I further checked the expression patterns of the genes determined by brain microarray studies. The resulting heatmap is './results/GTEx/Brain_microarrayUpsandDowns_inGTEx.pdf'. It seems like GTEx brain data shows similar pattern as the microarray but the other tissues differ.
- The list of up and down regulated gene id s are converted to affymetrix hg u133a ids, using getBM function in biomaRt1 R2 package. This step is required by the CMap web service.
- If there are some probeset ids shared between up and down regulated genes, these are discarded.
- The list of up and down regulated genes are uploaded to the query system in CMap website with the name 'gtex'.
- The result is downloaded as '.xlsx' file and converted to '.csv' files.
- The '.csv' files are read into R in order to correct the p values for multiple testing.
rank | cmap.name | mean | n | enrichment | p | specificity | percent.non.null | padj |
---|---|---|---|---|---|---|---|---|
1 | sirolimus | 0.412 | 44 | 0.361 | 0 | 0.1928 | 72 | 0.000000 |
2 | LY-294002 | 0.372 | 61 | 0.355 | 0 | 0.2282 | 65 | 0.000000 |
3 | isoxicam | -0.633 | 5 | -0.848 | 0.00024 | 0 | 100 | 0.039088 |
4 | quinostatin | 0.816 | 2 | 0.987 | 0.00028 | 0.0111 | 100 | 0.039088 |
5 | cephaeline | 0.725 | 5 | 0.833 | 0.00028 | 0.0959 | 100 | 0.039088 |
22 tissues
There were only 4 genes that has a consistent gene expression change with age:
- ENSG00000269834 (ZNF528 antisense RNA 1)
- ENSG00000134574 (damage specific DNA binding protein 2)
- ENSG00000170160 (coiled-coil domain containing 144A)
- ENSG00000162695 (solute carrier family 30 member 7)
Since there is also not enough number of genes of which expression values are significantly* changing with age, we resorted to another approach:
- Genes that are not expressed in all tissues are excluded from the downstream analysis (19456 genes left).
- For each gene, we calculated the number of datasets having a positive correlation with age, irrespective of the effect size.
- Using that distribution we considered a gene as a consistently changing gene if it is in the lower 0.5% (Decreasing expression according to 19 datasets) or upper 0.5% (Increasing expression according to 20 datasets).
- As a result we have 205 up and 254 down regulated genes.
Next, we asked 'How many of these up and down regulated genes are the ones we determined using Brain ageing microarray data?'
GTEx-up | GTEx-down | Brain_micro-up | Brain_micro-down | |
---|---|---|---|---|
GTEx-up | X | 0 | 2% | 0.9% |
GTEx-down | 0 | X | 1% | 3.4% |
Brain_micro-up | 1% | 0.4% | X | 0 |
Brain_micro-down | 0.5% | 1.6% | 0 | X |
- The list of up and down regulated gene id s are converted to affymetrix hg u133a ids, using getBM function in biomaRt1 R2 package. This step is required by the CMap web service.
- If there are some probeset ids shared between up and down regulated genes, these are discarded.
- The list of up and down regulated genes are uploaded to the query system in CMap website with the name 'gtex'.
- The result is downloaded as '.xlsx' file and converted to '.csv' files.
- The '.csv' files are read into R in order to correct the p values for multiple testing.
rank | cmap.name | mean | n | enrichment | p | specificity | percent.non.null | padj |
---|---|---|---|---|---|---|---|---|
1 | anisomycin | 0.771 | 4 | 0.975 | 0 | 0.0155 | 100 | 0.000000000 |
2 | puromycin | 0.701 | 4 | 0.947 | 0 | 0.0449 | 100 | 0.000000000 |
3 | cephaeline | 0.806 | 5 | 0.944 | 0 | 0.048 | 100 | 0.000000000 |
4 | thioridazine | 0.531 | 20 | 0.637 | 0 | 0.0776 | 80 | 0.000000000 |
5 | tanespimycin | 0.241 | 62 | 0.289 | 0 | 0.3938 | 53 | 0.000000000 |
6 | LY-294002 | 0.262 | 61 | 0.285 | 0.00002 | 0.3826 | 52 | 0.001326667 |
7 | terfenadine | 0.738 | 3 | 0.979 | 0.00004 | 0.0049 | 100 | 0.001326667 |
8 | emetine | 0.746 | 4 | 0.921 | 0.00004 | 0.0211 | 100 | 0.001326667 |
9 | niclosamide | 0.567 | 5 | 0.921 | 0.00004 | 0.0105 | 100 | 0.001326667 |
10 | cicloheximide | 0.714 | 4 | 0.918 | 0.00004 | 0.0339 | 100 | 0.001326667 |
11 | prochlorperazine | 0.431 | 16 | 0.572 | 0.00004 | 0.0631 | 75 | 0.001326667 |
12 | trifluoperazine | 0.414 | 16 | 0.569 | 0.00004 | 0.125 | 75 | 0.001326667 |
13 | loperamide | 0.497 | 6 | 0.791 | 0.00018 | 0.0101 | 100 | 0.005510769 |
14 | indoprofen | -0.554 | 4 | -0.897 | 0.00022 | 0 | 100 | 0.006254286 |
15 | sirolimus | 0.266 | 44 | 0.309 | 0.00028 | 0.3072 | 54 | 0.007429333 |
16 | alvespimycin | 0.382 | 12 | 0.572 | 0.00032 | 0.0402 | 75 | 0.007960000 |
17 | lanatoside C | 0.503 | 6 | 0.743 | 0.00077 | 0.1009 | 83 | 0.018027059 |
18 | quinisocaine | 0.541 | 4 | 0.845 | 0.00086 | 0 | 100 | 0.019015556 |
19 | thiamine | -0.537 | 3 | -0.916 | 0.00104 | 0 | 100 | 0.021293000 |
20 | digitoxigenin | 0.535 | 4 | 0.838 | 0.00107 | 0.0429 | 100 | 0.021293000 |
21 | rottlerin | 0.566 | 3 | 0.917 | 0.00118 | 0.0518 | 100 | 0.021890000 |
22 | phenoxybenzamine | 0.483 | 4 | 0.832 | 0.00121 | 0.2525 | 100 | 0.021890000 |
23 | PNU-0251126 | -0.413 | 6 | -0.706 | 0.00157 | 0.0133 | 83 | 0.027167826 |
24 | nicotinic acid | -0.419 | 4 | -0.827 | 0.00165 | 0 | 75 | 0.027362500 |
25 | 5224221 | 0.619 | 2 | 0.961 | 0.00264 | 0.1341 | 100 | 0.039800000 |
26 | calmidazolium | 0.618 | 2 | 0.959 | 0.00276 | 0.0474 | 100 | 0.039800000 |
27 | ellipticine | -0.312 | 4 | -0.804 | 0.00282 | 0.0382 | 50 | 0.039800000 |
28 | H-7 | -0.271 | 4 | -0.804 | 0.00284 | 0.2273 | 50 | 0.039800000 |
29 | DL-thiorphan | 0.596 | 2 | 0.959 | 0.00296 | 0 | 100 | 0.039800000 |
30 | 5255229 | 0.590 | 2 | 0.958 | 0.00308 | 0 | 100 | 0.039800000 |
31 | tetrahydroalstonine | -0.323 | 4 | -0.799 | 0.00318 | 0 | 75 | 0.039800000 |
32 | blebbistatin | 0.637 | 2 | 0.957 | 0.0032 | 0.0122 | 100 | 0.039800000 |
33 | cefotetan | -0.435 | 3 | -0.878 | 0.00357 | 0.0063 | 100 | 0.043056364 |
*1 Tissues analysed in this analysis
[1] "Adipose-Subcutaneous"
[2] "Adipose-Visceral-Omentum"
[3] "Artery-Aorta"
[4] "Artery-Tibial"
[5] "Brain-Amygdala"
[6] "Brain-Anteriorcingulatecortex-BA24"
[7] "Brain-Caudate-basalganglia"
[8] "Brain-CerebellarHemisphere"
[9] "Brain-Cerebellum"
[10] "Brain-Cortex"
[11] "Brain-FrontalCortex-BA9"
[12] "Brain-Hippocampus"
[13] "Brain-Hypothalamus"
[14] "Brain-Nucleusaccumbens-basalganglia"
[15] "Brain-Putamen-basalganglia"
[16] "Brain-Spinalcord-cervicalc-1"
[17] "Brain-Substantianigra"
[18] "Breast-MammaryTissue_male"
[19] "Colon-Sigmoid"
[20] "Esophagus-GastroesophagealJunction"
[21] "Esophagus-Mucosa"
[22] "Esophagus-Muscularis"
[23] "Heart-AtrialAppendage"
[24] "Heart-LeftVentricle"
[25] "Liver"
[26] "Lung"
[27] "Muscle-Skeletal"
[28] "Nerve-Tibial"
[29] "Pituitary"
[30] "Prostate"
[31] "Skin-NotSunExposed-Suprapubic"
[32] "Skin-SunExposed-Lowerleg"
[33] "Testis"
[34] "Thyroid"
[35] "WholeBlood"
*2 FDR adjusted p value < 0.05.
Tissue | # of Genes |
---|---|
Adipose-Subcutaneous | 0 |
Adipose-Visceral-Omentum | 0 |
Artery-Aorta | 60 |
Artery-Tibial | 271 |
Brain-Amygdala | 24 |
Brain-Anteriorcingulatecortex-BA24 | 6 |
Brain-Caudate-basalganglia | 0 |
Brain-CerebellarHemisphere | 608 |
Brain-Cerebellum | 0 |
Brain-Cortex | 1260 |
Brain-FrontalCortex-BA9 | 81 |
Brain-Hippocampus | 3892 |
Brain-Hypothalamus | 478 |
Brain-Nucleusaccumbens-basalganglia | 1414 |
Brain-Putamen-basalganglia | 1 |
Brain-Spinalcord-cervicalc-1 | 1 |
Brain-Substantianigra | 0 |
Breast-MammaryTissue_male | 0 |
Colon-Sigmoid | 0 |
Esophagus-GastroesophagealJunction | 0 |
Esophagus-Mucosa | 0 |
Esophagus-Muscularis | 0 |
Heart-AtrialAppendage | 0 |
Heart-LeftVentricle | 1 |
Liver | 0 |
Lung | 0 |
Muscle-Skeletal | 1 |
Nerve-Tibial | 1 |
Pituitary | 1 |
Prostate | 0 |
Skin-NotSunExposed-Suprapubic | 0 |
Skin-SunExposed-Lowerleg | 0 |
Testis | 0 |
Thyroid | 0 |
WholeBlood | 0 |
1 Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., De Moor, B., Brazma, A., & Huber, W. (2005). BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics (Oxford, England), 21(16), 3439–3440. https://doi.org/10.1093/bioinformatics/bti525
2 R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.