PubCaseFinder-RDF

This is a description of PubCaseFinder-RDF.


Prerequisites

First set up your environment:
Make sure a proper JDK is installed, Java SE 1.8 or higher. Just a JRE isn't enough, since the project requires compilation.

Data Download

Data Resource link
hp.obo http://purl.obolibrary.org/obo/hp.obo
mondo.obo http://purl.obolibrary.org/obo/mondo.obo
mim2gene.txt https://www.omim.org/static/omim/data/mim2gene.txt
phenotype.hpoa http://purl.obolibrary.org/obo/hp/hpoa/phenotype.hpoa
gencc-submissions.csv https://search.thegencc.org/download
MedGen_HPO_OMIM_Mapping.txt https://ftp.ncbi.nlm.nih.gov/pub/medgen/MedGen_HPO_OMIM_Mapping.txt.gz
NBKid_shortname_OMIM.txt https://ftp.ncbi.nlm.nih.gov/pub/GeneReviews/NBKid_shortname_OMIM.txt
mim2gene_medgen.txt https://ftp.ncbi.nlm.nih.gov/gene/DATA/mim2gene_medgen
Homo_sapiens.gene_info https://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz
en_product4.xml http://www.orphadata.org/data/xml/en_product4.xml
ORPHADATA PHENOTYPES ASSOCIATED WITH RARE DISORDERS
en_product6.xml http://www.orphadata.org/data/xml/en_product6.xml
ORPHADATA GENES ASSOCIATED WITH RARE DISEASES
HGNC_custom.txt https://www.genenames.org/download/custom/
OMIM_id_ja.txt OMIM_id_ja.txt
HPO_id_ja.txt HPO_id_ja.txt
HPO_Inheritance_en_jp.txt HPO_Inheritance_en_jp.txt
KEGG_disease.tsv KEGG_disease.tsv
NCBI_gene_summary.txt NCBI_gene_summary.txt
UR_DBMS_DiseaseLinkOMIM.csv UR_DBMS_DiseaseLinkOMIM.csv
UR_DBMS_DiseaseLink.csv UR_DBMS_DiseaseLink.csv

※ Precautions for 'HGNC_custom.txt'.
You can create custom files on the https://www.genenames.org/download/custom/ website.
First, unselect everything. Then select the following information.

  • Curated by the HGNC
    • HGNC ID, Approved symbol
  • Downloaded from external sources
    • NCBI Gene ID(supplied by NCBI)
  • Select status
    • Approved

When you are done selecting, click the Submit button. If the created file looks like this, it's a success.

HGNC ID         Approved symbol NCBI Gene ID(supplied by NCBI)
HGNC:5  	A1BG            1
HGNC:37133	A1BG-AS1        503538
HGNC:24086	A1CF            29974
HGNC:7          A2M             2
HGNC:27057	A2M-AS1         144571
HGNC:23336	A2ML1           144568
HGNC:41022	A2ML1-AS1   	100874108
HGNC:41523	A2ML1-AS2   	106478979
HGNC:8          A2MP1           3
...

Running

The script to use is here.

1. Disease Gene Association

Example run:

The following command outputs a file in Turtle format.

$ javac DiseaseGeneAssociation.java
$ java DiseaseGeneAssociation HGNC_custom.txt mondo.obo mim2gene_medgen.txt en_product6.xml gencc-submissions.csv

Output result file:

The out result file from the example run will at 'OMIM_Gene_Association.ttl' and 'Orphanet_Gene_Association.ttl'.

The output are written to the disk as 'OMIM_Gene_Association.ttl' file. They look like this

PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX ncbigene: <http://identifiers.org/ncbigene/>
PREFIX mim: <http://identifiers.org/mim/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX sio: <http://semanticscience.org/resource/>
<https://pubcasefinder.dbcls.jp/gene_context/disease:OMIM:613320/gene:ENT:51025>
    a sio:SIO_000983 ;
    sio:SIO_000628 mim:613320, ncbigene:51025 ;
    dcterms:source <ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/mim2gene_medgen> .
...

The output are written to the disk as 'Orphanet_Gene_Association.ttl' file. They look like this

PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX ncbigene: <http://identifiers.org/ncbigene/>
PREFIX ordo: <http://www.orpha.net/ORDO/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX sio: <http://semanticscience.org/resource/>
<https://pubcasefinder.dbcls.jp/gene_context/disease:ORDO:178342/gene:ENT:1213>
    a sio:SIO_000983 ;
    sio:SIO_000628 ordo:Orphanet_178342, ncbigene:1213 ;
    dcterms:source <http://www.orphadata.org/data/xml/en_product6.xml> .
...

2. Disease HPO Association

Example run:

$ javac DiseaseHpoAssociation.java
$ java DiseaseHpoAssociation phenotype.hpoa en_product4.xml

Output result file:

The out result file from the example run will at 'OMIM_HP_Association.ttl' and 'Orphanet_HP_Association.ttl'.

3. NCBI HGNC Gene

  • Example run:
    • $ javac NCBI_HGNC.java
    • $ java NCBI_HGNC HGNC_custom.txt Homo_sapiens.gene_info mim2gene.txt NCBI_gene_summary.txt
  • Output
    • NCBI_HGNC.ttl

4. HP Inheritance

  • Example run:
    • $ javac HP_Inheritance.java
    • $ java HP_Inheritance hp.obo HPO_id_ja.txt HPO_Inheritance_en_jp.txt
  • Output
    • HP_Inheritance.ttl

5. Disease

  • Example run:
    • $ javac Disease.java
    • $ java Disease mim2gene.txt OMIM_id_ja.txt MedGen_HPO_OMIM_Mapping.txt mondo.obo NBKid_shortname_OMIM.txt UR_DBMS_DiseaseLinkOMIM.csv UR_DBMS_DiseaseLink.csv KEGG_disease.tsv
  • Output
    • OMIM.ttl
    • Orphanet.ttl

Contact