humanAgeing_CMap

Human Brain Ageing Microarray Datasets

Data Preparation:

Data preprocessed following the steps explained in my Msc. thesis project.

Files are saved as .rds data objects under './data/processed/humanBrainMicroarray/' folder.

Exploratory data analysis results are saved under './results/humanBrainMicroarray/' folder.

In total there are 26 sub-datasets from 7 different sources (labs / experiments). In total 28237 genes are detected at least one dataset. The number of genes detected across all datasets is 5677. The number of genes in each dataset is given in ./results/humanBrainMicroarray/numberofgenes.pdf. The distribution of number of datasets sharing each gene is given in ./results/humanBrainMicroarray/detectedgenedistribution.pdf.

Next we searched for the genes which show expression changes in the same direction across all datasets. Depending on how many datasets a given gene is detected, a distribution of number of datasets that show the same direction of change is generated - ./results/humanBrainMicroarray/genesharednessdistribution.pdf.

CMap Analysis:

All datasources:

We compiled the genes showing consistent expression change with age across all datasets.

Up	Down	Background
100	117	5677

CMap Query:

The list of up and down regulated gene id s are converted to affymetrix hg u133a ids, using getBM function in biomaRt¹ R² package. This step is required by the CMap web service.
If there are some probeset ids shared between up and down regulated genes, these are discarded.
The list of up and down regulated genes are uploaded to the query system in CMap website with the name 'humanBrainMicroarray'.
The result is downloaded as '.xlsx' file and converted to '.csv' files.
The '.csv' files are read into R in order to correct the p values for multiple testing.

rank	cmap.name	mean	n	enrichment	p	specificity	percent.non.null	padj
1	geldanamycin	-0.453	15	-0.641	0	0.0234	80	0.000000000
2	15-delta prostaglandin J2	-0.380	15	-0.596	0	0.0451	60	0.000000000
3	tretinoin	0.420	22	0.553	0	0	77	0.000000000
4	LY-294002	0.380	61	0.465	0	0.0872	72	0.000000000
5	quinostatin	0.861	2	0.995	0.00002	0.0055	100	0.001904000
6	sirolimus	0.282	44	0.345	0.00004	0.2048	52	0.003173333
7	thioridazine	0.354	20	0.525	0.00006	0.2055	75	0.004080000
8	wortmannin	0.288	18	0.479	0.00024	0.1548	55	0.014280000
9	trifluoperazine	0.315	16	0.501	0.00028	0.1971	75	0.014808889
10	securinine	-0.646	4	-0.884	0.00042	0	100	0.019992000
11	emetine	0.523	4	0.844	0.00088	0.0845	100	0.037286667
12	atropine oxide	-0.346	5	-0.779	0.00094	0	80	0.037286667
13	camptothecin	0.633	3	0.920	0.00112	0.1563	100	0.041009231

All datasources except Lu2004:

Since Lu2004 dataset decreases the number of detected genes dramatically, we repeated the analysis after removing it, to see if it creates a bias.

In total there are 25 sub-datasets from 6 different sources (labs / experiments). In total 28200 genes are detected at least one dataset. The number of genes detected across all datasets is 10893.

Up	Down	Background
192	202	10893

CMap Query:

The list of up and down regulated gene id s are converted to affymetrix hg u133a ids, using getBM function in biomaRt¹ R² package. This step is required by the CMap web service.
If there are some probeset ids shared between up and down regulated genes, these are discarded.
The list of up and down regulated genes are uploaded to the query system in CMap website with the name 'humanBrainMicroarray_noLu'.
The result is downloaded as '.xlsx' file and converted to '.csv' files.
The '.csv' files are read into R in order to correct the p values for multiple testing.

rank	cmap.name	mean	n	enrichment	p	specificity	percent.non.null	padj
1	15-delta prostaglandin J2	-0.391	15	-0.584	0	0.0526	66	0.000000
2	wortmannin	0.376	18	0.570	0	0.0323	66	0.000000
3	LY-294002	0.425	61	0.537	0	0.0336	77	0.000000
4	sirolimus	0.369	44	0.421	0	0.1265	65	0.000000
5	tretinoin	0.347	22	0.472	0.00006	0.0398	63	0.005472
6	thioridazine	0.334	20	0.469	0.00016	0.3014	70	0.012160

GTEx Ageing Data

Data Preparation

Data is preprocessed following the steps explained in my GTEx repository.

Among all tissues, only the ones having at least 20 subjects are considered. We also excluded 'Cells-Transformedfibroblasts' category. As a result 35 tissues^*1 (17 major tissue type) are used for the downstream analysis.

Files are saved as .rds data objects under './data/processed/GTEx/' folder.

Exploratory data analysis results are saved under './results/GTEx/' folder.

CMap analysis

All Tissues:

There was only 1 gene that has a consistent gene expression change with age: ENSG00000269834 (ZNF528 antisense RNA 1)

Since there is also not enough number of genes of which expression values are significantly^*2 changing with age, we resorted to another approach:

Genes that are not expressed in all tissues are excluded from the downstream analysis (19064 genes left).
For each gene, we calculated the number of datasets having a positive correlation with age, irrespective of the effect size.
Using that distribution we considered a gene as a consistently changing gene if it is in the lower 0.5% (Decreasing expression according to 31 datasets) or upper 0.5% (Increasing expression according to 32 datasets).
As a result we have 112 up and 104 down regulated genes.

Next, we asked 'How many of these up and down regulated genes are the ones we determined using Brain ageing microarray data?'

	GTEx-up	GTEx-down	Brain_micro-up	Brain_micro-down
GTEx-up	X	0	4%	0
GTEx-down	0	X	0	3.4%
Brain_micro-up	3.6%	0	X	0
Brain_micro-down	0	3.8%	0	X

Since the resulting numbers are quite low, I further checked the expression patterns of the genes determined by brain microarray studies. The resulting heatmap is './results/GTEx/Brain_microarrayUpsandDowns_inGTEx.pdf'. It seems like GTEx brain data shows similar pattern as the microarray but the other tissues differ.

CMap Query:

The list of up and down regulated gene id s are converted to affymetrix hg u133a ids, using getBM function in biomaRt¹ R² package. This step is required by the CMap web service.
If there are some probeset ids shared between up and down regulated genes, these are discarded.
The list of up and down regulated genes are uploaded to the query system in CMap website with the name 'gtex'.
The result is downloaded as '.xlsx' file and converted to '.csv' files.
The '.csv' files are read into R in order to correct the p values for multiple testing.

rank	cmap.name	mean	n	enrichment	p	specificity	percent.non.null	padj
1	sirolimus	0.412	44	0.361	0	0.1928	72	0.000000
2	LY-294002	0.372	61	0.355	0	0.2282	65	0.000000
3	isoxicam	-0.633	5	-0.848	0.00024	0	100	0.039088
4	quinostatin	0.816	2	0.987	0.00028	0.0111	100	0.039088
5	cephaeline	0.725	5	0.833	0.00028	0.0959	100	0.039088

All Tissues except Brain:

22 tissues

There were only 4 genes that has a consistent gene expression change with age:

Since there is also not enough number of genes of which expression values are significantly^* changing with age, we resorted to another approach:

Genes that are not expressed in all tissues are excluded from the downstream analysis (19456 genes left).
For each gene, we calculated the number of datasets having a positive correlation with age, irrespective of the effect size.
Using that distribution we considered a gene as a consistently changing gene if it is in the lower 0.5% (Decreasing expression according to 19 datasets) or upper 0.5% (Increasing expression according to 20 datasets).
As a result we have 205 up and 254 down regulated genes.

Next, we asked 'How many of these up and down regulated genes are the ones we determined using Brain ageing microarray data?'

	GTEx-up	GTEx-down	Brain_micro-up	Brain_micro-down
GTEx-up	X	0	2%	0.9%
GTEx-down	0	X	1%	3.4%
Brain_micro-up	1%	0.4%	X	0
Brain_micro-down	0.5%	1.6%	0	X

CMap Query:

The list of up and down regulated gene id s are converted to affymetrix hg u133a ids, using getBM function in biomaRt¹ R² package. This step is required by the CMap web service.
If there are some probeset ids shared between up and down regulated genes, these are discarded.
The list of up and down regulated genes are uploaded to the query system in CMap website with the name 'gtex'.
The result is downloaded as '.xlsx' file and converted to '.csv' files.
The '.csv' files are read into R in order to correct the p values for multiple testing.

rank	cmap.name	mean	n	enrichment	p	specificity	percent.non.null	padj
1	anisomycin	0.771	4	0.975	0	0.0155	100	0.000000000
2	puromycin	0.701	4	0.947	0	0.0449	100	0.000000000
3	cephaeline	0.806	5	0.944	0	0.048	100	0.000000000
4	thioridazine	0.531	20	0.637	0	0.0776	80	0.000000000
5	tanespimycin	0.241	62	0.289	0	0.3938	53	0.000000000
6	LY-294002	0.262	61	0.285	0.00002	0.3826	52	0.001326667
7	terfenadine	0.738	3	0.979	0.00004	0.0049	100	0.001326667
8	emetine	0.746	4	0.921	0.00004	0.0211	100	0.001326667
9	niclosamide	0.567	5	0.921	0.00004	0.0105	100	0.001326667
10	cicloheximide	0.714	4	0.918	0.00004	0.0339	100	0.001326667
11	prochlorperazine	0.431	16	0.572	0.00004	0.0631	75	0.001326667
12	trifluoperazine	0.414	16	0.569	0.00004	0.125	75	0.001326667
13	loperamide	0.497	6	0.791	0.00018	0.0101	100	0.005510769
14	indoprofen	-0.554	4	-0.897	0.00022	0	100	0.006254286
15	sirolimus	0.266	44	0.309	0.00028	0.3072	54	0.007429333
16	alvespimycin	0.382	12	0.572	0.00032	0.0402	75	0.007960000
17	lanatoside C	0.503	6	0.743	0.00077	0.1009	83	0.018027059
18	quinisocaine	0.541	4	0.845	0.00086	0	100	0.019015556
19	thiamine	-0.537	3	-0.916	0.00104	0	100	0.021293000
20	digitoxigenin	0.535	4	0.838	0.00107	0.0429	100	0.021293000
21	rottlerin	0.566	3	0.917	0.00118	0.0518	100	0.021890000
22	phenoxybenzamine	0.483	4	0.832	0.00121	0.2525	100	0.021890000
23	PNU-0251126	-0.413	6	-0.706	0.00157	0.0133	83	0.027167826
24	nicotinic acid	-0.419	4	-0.827	0.00165	0	75	0.027362500
25	5224221	0.619	2	0.961	0.00264	0.1341	100	0.039800000
26	calmidazolium	0.618	2	0.959	0.00276	0.0474	100	0.039800000
27	ellipticine	-0.312	4	-0.804	0.00282	0.0382	50	0.039800000
28	H-7	-0.271	4	-0.804	0.00284	0.2273	50	0.039800000
29	DL-thiorphan	0.596	2	0.959	0.00296	0	100	0.039800000
30	5255229	0.590	2	0.958	0.00308	0	100	0.039800000
31	tetrahydroalstonine	-0.323	4	-0.799	0.00318	0	75	0.039800000
32	blebbistatin	0.637	2	0.957	0.0032	0.0122	100	0.039800000
33	cefotetan	-0.435	3	-0.878	0.00357	0.0063	100	0.043056364

Footnotes:

^*1 Tissues analysed in this analysis

[1] "Adipose-Subcutaneous"
[2] "Adipose-Visceral-Omentum"
[3] "Artery-Aorta"
[4] "Artery-Tibial"
[5] "Brain-Amygdala"
[6] "Brain-Anteriorcingulatecortex-BA24" [7] "Brain-Caudate-basalganglia"
[8] "Brain-CerebellarHemisphere"
[9] "Brain-Cerebellum"
[10] "Brain-Cortex"
[11] "Brain-FrontalCortex-BA9"
[12] "Brain-Hippocampus"
[13] "Brain-Hypothalamus"
[14] "Brain-Nucleusaccumbens-basalganglia" [15] "Brain-Putamen-basalganglia"
[16] "Brain-Spinalcord-cervicalc-1"
[17] "Brain-Substantianigra"
[18] "Breast-MammaryTissue_male"
[19] "Colon-Sigmoid"
[20] "Esophagus-GastroesophagealJunction" [21] "Esophagus-Mucosa"
[22] "Esophagus-Muscularis"
[23] "Heart-AtrialAppendage"
[24] "Heart-LeftVentricle"
[25] "Liver"
[26] "Lung"
[27] "Muscle-Skeletal"
[28] "Nerve-Tibial"
[29] "Pituitary"
[30] "Prostate"
[31] "Skin-NotSunExposed-Suprapubic"
[32] "Skin-SunExposed-Lowerleg"
[33] "Testis"
[34] "Thyroid"
[35] "WholeBlood"

^*2 FDR adjusted p value < 0.05.

Tissue	# of Genes
Adipose-Subcutaneous	0
Adipose-Visceral-Omentum	0
Artery-Aorta	60
Artery-Tibial	271
Brain-Amygdala	24
Brain-Anteriorcingulatecortex-BA24	6
Brain-Caudate-basalganglia	0
Brain-CerebellarHemisphere	608
Brain-Cerebellum	0
Brain-Cortex	1260
Brain-FrontalCortex-BA9	81
Brain-Hippocampus	3892
Brain-Hypothalamus	478
Brain-Nucleusaccumbens-basalganglia	1414
Brain-Putamen-basalganglia	1
Brain-Spinalcord-cervicalc-1	1
Brain-Substantianigra	0
Breast-MammaryTissue_male	0
Colon-Sigmoid	0
Esophagus-GastroesophagealJunction	0
Esophagus-Mucosa	0
Esophagus-Muscularis	0
Heart-AtrialAppendage	0
Heart-LeftVentricle	1
Liver	0
Lung	0
Muscle-Skeletal	1
Nerve-Tibial	1
Pituitary	1
Prostate	0
Skin-NotSunExposed-Suprapubic	0
Skin-SunExposed-Lowerleg	0
Testis	0
Thyroid	0
WholeBlood	0

References:

¹ Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., De Moor, B., Brazma, A., & Huber, W. (2005). BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics (Oxford, England), 21(16), 3439–3440. https://doi.org/10.1093/bioinformatics/bti525

² R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

mdonertas/humanAgeing_CMap

humanAgeing_CMap

Human Brain Ageing Microarray Datasets

Data Preparation:

CMap Analysis:

All datasources:

CMap Query:

All datasources except Lu2004:

CMap Query:

GTEx Ageing Data

Data Preparation

CMap analysis

All Tissues:

CMap Query:

All Tissues except Brain:

CMap Query:

Footnotes:

References: