Create a uPheno release product for data analysis
matentzn opened this issue · 2 comments
matentzn commented
@pnrobinson requested a uPheno release product that we should add to the uPheno2 release before the end of March. Given this picture
I hope I understood correctly @pnrobinson that, given the above picture, you want the following table:
taxon | upheno_id | original_phenotype | gene | human_orthologue |
---|---|---|---|---|
NCBITaxon:10090 | UPHENO:0034327 | MP:0030719 | ncbi.gene:68646 | hgnc:26404 |
Is this correct?
pnrobinson commented
@matentzn this table would be exactly what we need!
matentzn commented
First draft
Code to generate Table
from neo4j import GraphDatabase
# Connect to the Neo4j database
bolt_url = "ASK_NICO"
driver = GraphDatabase.driver(bolt_url)
# Define the Cypher query
query = """
MATCH
(upheno:`biolink:PhenotypicFeature` WHERE upheno.id STARTS WITH "UPHENO:")<-[:`biolink:subclass_of`]-(phenotype:`biolink:PhenotypicFeature`)<-[gena:`biolink:has_phenotype`]-(gene:`biolink:Gene`)-[:`biolink:orthologous_to`]-(human_gene:`biolink:Gene` WHERE "NCBITaxon:9606" IN human_gene.in_taxon)
RETURN
upheno.id,
phenotype.id,
gene.id,
gena.negated,
CASE WHEN gene.in_taxon IS NOT NULL AND size(gene.in_taxon) > 0
THEN REDUCE(s = "", x IN gene.in_taxon | s + x + CASE WHEN x <> gene.in_taxon[size(gene.in_taxon)-1] THEN "|" ELSE "" END)
ELSE "" END AS gene_in_taxon,
human_gene.id,
gena.primary_knowledge_source,
gena.publications
"""
# Run the query and print the results
data = []
with driver.session() as session:
results = session.run(query)
for record in results:
data.append(record)
import pandas as pd
df = pd.DataFrame(data, columns=["upheno_grouping", "phenotype", "gene", "negated", "taxon", "human_orthologue", "source", "publications"])
df
Draft result:
upheno_grouping | phenotype | gene | negated | taxon | human_orthologue | source | publications |
---|---|---|---|---|---|---|---|
UPHENO:0000508 | ZP:0000606 | ZFIN:ZDB-GENE-040426-1675 | NCBITaxon:7955 | HGNC:9721 | infores:zfin | ['ZFIN:ZDB-PUB-170311-8'] | |
UPHENO:0000508 | ZP:0000606 | ZFIN:ZDB-GENE-040426-1675 | NCBITaxon:7955 | HGNC:30262 | infores:zfin | ['ZFIN:ZDB-PUB-170311-8'] | |
UPHENO:0000508 | WBPhenotype:0000848 | WB:WBGene00044068 | NCBITaxon:6239 | HGNC:12927 | infores:wormbase | ['PMID:16803962'] | |
UPHENO:0000508 | WBPhenotype:0000848 | WB:WBGene00009178 | NCBITaxon:6239 | HGNC:15664 | infores:wormbase | ['PMID:22073243'] | |
UPHENO:0000508 | WBPhenotype:0000848 | WB:WBGene00009178 | NCBITaxon:6239 | HGNC:15663 | infores:wormbase | ['PMID:22073243'] | |
UPHENO:0000508 | WBPhenotype:0000848 | WB:WBGene00000914 | NCBITaxon:6239 | HGNC:9984 | infores:wormbase | ['PMID:29301909'] | |
UPHENO:0000508 | WBPhenotype:0000848 | WB:WBGene00000914 | NCBITaxon:6239 | HGNC:9983 | infores:wormbase | ['PMID:29301909'] | |
UPHENO:0000508 | WBPhenotype:0000848 | WB:WBGene00000914 | NCBITaxon:6239 | HGNC:9982 | infores:wormbase | ['PMID:29301909'] | |
UPHENO:0000508 | WBPhenotype:0000848 | WB:WBGene00022620 | NCBITaxon:6239 | HGNC:20165 | infores:wormbase | ['PMID:25635455'] | |
UPHENO:0000508 | WBPhenotype:0000848 | WB:WBGene00022620 | NCBITaxon:6239 | HGNC:17407 | infores:wormbase | ['PMID:25635455'] |
@pnrobinson if this works for you, you can do a first experiment with this table:
@kevinschaper did all the heavy lifting, so THANK YOU!