LieberInstitute/spatialDLPFC

Extract new clinical gene sets from the literature

kmaynard12 opened this issue · 6 comments

@abspangler13 where where you doing this and how are along are you?

I performed it for the k= 9 data set against all of the datasets from the pilot study and two new datasets that I added. Here's the code for the two new datasets I added as well as some comments about two sets we were interested in adding.

##############
#### Nagy sn_rna_seq in dlpfc in MDD
#### https://www.nature.com/articles/s41593-020-0621-y
#### sup table 6 is marker genes
#### sup table 32 is DEGs
#### file = /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rdata/spe/10_clinical_gene_set_enrichment/41593_2020_621_MOESM3_ESM.xlsx
#############
mdd <- as.data.frame(read_excel("/dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rdata/spe/10_clinical_gene_set_enrichment/41593_2020_621_MOESM3_ESM.xlsx", sheet = "Supplementary Table 32", skip = 2, ))
# need to somehow get gene_id or ensemblID
ens4 <- select(org.Hs.eg.db,
columns = c("ENSEMBL", "ENTREZID", "SYMBOL"),
keytypes = "SYMBOL",
keys = as.character(unique(mdd$Gene))
)
mdd_geneList <- with(
mdd,
list(
# DE_PE_ASD.Up = ensembl_gene_id[ASD.t.value > 0 & ASD.fdr < 0.05],
# DE_PE_ASD.Down = ensembl_gene_id[ASD.t.value < 0 & ASD.fdr < 0.05],
# DE_PE_BD.Up = ensembl_gene_id[BD.t.value > 0 & BD.fdr < 0.05],
# DE_PE_BD.Down = ensembl_gene_id[BD.t.value < 0 & BD.fdr < 0.05],
# DE_PE_SCZ.Up = ensembl_gene_id[SCZ.t.value > 0 & SCZ.fdr < 0.05],
# DE_PE_SCZ.Down = ensembl_gene_id[SCZ.t.value < 0 & SCZ.fdr < 0.05]
)
)
##############
### snRNAseq ASD
### https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7678724/#SD5
### data S4 is DEGS
### data S3 is marker genes
### file = /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rdata/spe/10_clinical_gene_set_enrichment/NIHMS1053005-supplement-Data_S4.xls
#############
asd_rnaseq <- as.data.frame(read_excel("/dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rdata/spe/10_clinical_gene_set_enrichment/NIHMS1053005-supplement-Data_S4.xls", sheet = "ASD_DEGs"))
asd_rnaseq <- clean_names(asd_rnaseq)
asdRNA_geneList <- list(
DE_ASD_RNA.Up = asd_rnaseq$gene_id[asd_rnaseq$fold_change > 0 & asd_rnaseq$q_value < 0.05],
DE_ASD_RNA.Down = asd_rnaseq$gene_id[asd_rnaseq$fold_chang < 0 & asd_rnaseq$q_value < 0.05]
)
############
### https://www.medrxiv.org/content/10.1101/2020.11.06.20225342v1.full
### snRNAseq SCZ
### supplementary table 6 is DEGS, can't figure out how to download
##########
##########
### snRNAseq and spatial SCZ
### https://www.biorxiv.org/content/10.1101/2020.11.17.386458v2
### supplementary table 2 is marker genes
### supplementary table 4 is DEGs, can't figure out how to download
###########

@kmaynard12 is going to work on this. @lahuuki, I wrote https://github.com/LieberInstitute/spatialDLPFC/tree/main/code/analysis/10_clinical_gene_set_enrichment in such a way that you would need to make 2 new scripts. One for extracting the gene IDs (ENSEMBL IDs) from the different tables @kmaynard12 will select, then another one for computing the odds ratio + making the heatmaps.

@kmaynard12 are there other case/control snRNA-seq datasets beyond the PEC ones we should be looking at?

Related to https://jhu-genomics.slack.com/archives/C01EA7VDJNT/p1673285746357859

@lahuuki I think that we can close this issue, right?