approvalsSupport: A Jupyter Notebook repository from Open Targets

Genetic Evidence for FDA-Approved Drugs Pipeline

This repository contains the pipeline for generation and analysis of genetic evidence support for FDA-approved drugs from 2013-2022, as taken from Mullard publications in Nature Reviews Drug Discovery.

Data Structure

data/2013-2022_approvals_in.csv: Input data with FDA-approved drugs. The file provides details on the drugs mapped to ChEMBL IDs, indications manually mapped to ontology terms, and the classification of diseases.
data/amendMoas: A manually curated list of mechanisms of action absent in ChEMBL.
data/amendPhenotypes: A manually curated list of related indications.
data/datasourceMetadata: List of OpenTargets data sources.

Processing Steps

code/GE_search.R:
- Uses Google Cloud Cluster gs://open-targets-data-releases/ via Spark for processing.
- Uses data/2013-2022_approvals_in.csv as the main input file and the above-mentioned lists.
- Generates two output files:
  - results/2013-2022_approvals_GE_src.csv: Organizes evidence by data source of genetic evidence support and evidence type.
  - results/2013-2022_approvals_GE_out.csv: Extends the input file with additional columns for target IDs, interactor IDs, and related ontology terms.
code/GE_types_add.R:
- Processes the output from GE_search.R.
- Generates results/2013-2022_approvals_GE.csv with new columns indicating the type of genetic evidence found for each drug-disease pair.
code/GE_prior.py:
- Uses Google Cloud Cluster gs://open-targets-data-releases/ via Spark for processing.
- Uses results/2013-2022_approvals_GE.csv to find the date for genetic evidence support.
- Outputs results/2013-2022_approvals_preGE0.csv.
Manual Curation:
- Complex cases in the above file are manually curated to produce results/2013-2022_approvals_preGE.csv.

Analysis

GE_Year_plot.R:
- Generates a plot showing genetic evidence support for approved drugs by year.
GE_by_year_plot.R:
- Generates a plot for each year showing genetic evidence support for approved drugs by datasource.
OR_plot.R:
- Computes and plots odds ratios (OR) for approvals with expedited review status and for those addressing serious conditions.
Venn_plot.ipynb:
- Produces an intersection diagram, showing the overlap between approvals with genetic evidence support, approvals with expedited review status, and approvals for serious conditions.

How to Use

Ensure you have the required input data and lists.
Run the processing steps in the order mentioned above.
Conduct manual curation where necessary.
Run the analysis scripts to generate visualizations and insights.

Contributing

If you would like to contribute to this project or have any queries, please open an issue or submit a pull request.

License

MIT License

Acknowledgments

Thanks to Mullard publications for providing the initial dataset of FDA-approved drugs. Special thanks to contributors who helped with manual curation and data improvements. Additionally, we'd like to extend our gratitude to OpenAI for ChatGPT-4, which assisted in the disease classification process and the generation of these instructions.

Publication release

The code corresponding to our publication is available in our first release. Please refer to Genetic Evidence Support for FDA-Approved Drugs (2013-2022) for the version of the code that was used.

opentargets/approvalsSupport