SciCrunch gene-disease query tool

Extracts counts of various SciCrunch objects related to disease/gene pairs.

Downloading disease data

For a set of Human Disease Ontology IDs (DOIDs), downloads database and species information from Starting with a CSV containing a set of DOIDs in the first column and disease names in the second column, calling python <filename> will create a file facets.csv with downloaded data, which includes database and species counts. Alternatively, you can call the method extract_data() directly for more fine-grained control.

Joining disease with gene data takes a CSV file with both disease and gene information downloaded from the OMIM database, processes it, and combines it with database information from SciCrunch. This files requires the environment variable SCICRUNCH_KEY to be set, which is an API key for the SciCrunch API. Calling python <omim_filename> creates two files, genes.json and gene_to_disease.json, which include gene-to-database and gene-to-disease information, respectively.

Exporting to CSV takes the information in genes.json and gene_to_disease.json, aggregates it, and outputs to a human-readable CSV format. After running, calling python will create a file genes.csv with aggregated data.