This repository contains the pipeline for generation and analysis of genetic evidence support for FDA-approved drugs from 2013-2022, as taken from Mullard publications in Nature Reviews Drug Discovery.
-
data/2013-2022_approvals_in.csv: Input data with FDA-approved drugs. The file provides details on the drugs mapped to ChEMBL IDs, indications manually mapped to ontology terms, and the classification of diseases.
-
data/amendMoas: A manually curated list of mechanisms of action absent in ChEMBL.
-
data/amendPhenotypes: A manually curated list of related indications.
-
data/datasourceMetadata: List of OpenTargets data sources.
-
code/GE_search.R:
- Uses Google Cloud Cluster gs://open-targets-data-releases/ via Spark for processing.
- Uses
data/2013-2022_approvals_in.csv
as the main input file and the above-mentioned lists. - Generates two output files:
- results/2013-2022_approvals_GE_src.csv: Organizes evidence by data source of genetic evidence support and evidence type.
- results/2013-2022_approvals_GE_out.csv: Extends the input file with additional columns for target IDs, interactor IDs, and related ontology terms.
-
code/GE_types_add.R:
- Processes the output from
GE_search.R
. - Generates results/2013-2022_approvals_GE.csv with new columns indicating the type of genetic evidence found for each drug-disease pair.
- Processes the output from
-
code/GE_prior.py:
- Uses Google Cloud Cluster gs://open-targets-data-releases/ via Spark for processing.
- Uses
results/2013-2022_approvals_GE.csv
to find the date for genetic evidence support. - Outputs results/2013-2022_approvals_preGE0.csv.
-
Manual Curation:
- Complex cases in the above file are manually curated to produce results/2013-2022_approvals_preGE.csv.
-
GE_Year_plot.R:
- Generates a plot showing genetic evidence support for approved drugs by year.
-
GE_by_year_plot.R:
- Generates a plot for each year showing genetic evidence support for approved drugs by datasource.
-
OR_plot.R:
- Computes and plots odds ratios (OR) for approvals with expedited review status and for those addressing serious conditions.
-
Venn_plot.ipynb:
- Produces an intersection diagram, showing the overlap between approvals with genetic evidence support, approvals with expedited review status, and approvals for serious conditions.
- Ensure you have the required input data and lists.
- Run the processing steps in the order mentioned above.
- Conduct manual curation where necessary.
- Run the analysis scripts to generate visualizations and insights.
If you would like to contribute to this project or have any queries, please open an issue or submit a pull request.
Thanks to Mullard publications for providing the initial dataset of FDA-approved drugs. Special thanks to contributors who helped with manual curation and data improvements. Additionally, we'd like to extend our gratitude to OpenAI for ChatGPT-4, which assisted in the disease classification process and the generation of these instructions.
The code corresponding to our publication is available in our first release. Please refer to Genetic Evidence Support for FDA-Approved Drugs (2013-2022) for the version of the code that was used.