Extracts counts of various SciCrunch objects related to disease/gene pairs.
For a set of Human Disease Ontology IDs (DOIDs), download_disease_data.py
downloads database and species information from dknet.org
. Starting with a CSV containing a set of DOIDs in the first column and disease names in the second column, calling python download_disease_data.py <filename>
will create a file facets.csv
with downloaded data, which includes database and species counts. Alternatively, you can call the method extract_data()
directly for more fine-grained control.
map_disease_to_gene.py
takes a CSV file with both disease and gene information downloaded from the OMIM database, processes it, and combines it with database information from SciCrunch. This files requires the environment variable SCICRUNCH_KEY
to be set, which is an API key for the SciCrunch API. Calling python map_disease_to_gene.py <omim_filename>
creates two files, genes.json
and gene_to_disease.json
, which include gene-to-database and gene-to-disease information, respectively.
to_csv.py
takes the information in genes.json
and gene_to_disease.json
, aggregates it, and outputs to a human-readable CSV format. After running map_disease_to_gene.py
, calling python to_csv.py
will create a file genes.csv
with aggregated data.