/openfda-project

This project uses data from the OpenFDA API to understand the impact of certain ingredients in drugs on adverse events reported over a period of five years. We quantify this impact, identify the most influential ingredients, and develop a predictive model for adverse events based on these ingredients.

MIT LicenseMIT

Analysis of OpenFDA Data to Understand Drug Ingredients and Adverse Events

This repository contains code, data, and documentation for a project analyzing data from the OpenFDA API. The goal of the project is to understand the relationship between drug ingredients and the occurrence of adverse events.

Project Scope

This project focuses on data from the past ten years and examines the relationship between drug ingredients (both active and inactive) and the reporting of adverse events. The main objective is to understand if, and how, certain ingredients might increase the risk of adverse events.

Objectives

  1. Data Acquisition and Preprocessing: Extract data from the OpenFDA API and other relevant sources like the CDC or WHO if needed. Preprocess and clean the data to a suitable format for analysis.

  2. Exploratory Data Analysis: Analyze the data to understand its characteristics, including the distribution of adverse events across different drugs. Identify any apparent relationships between ingredients and adverse events.

  3. Ingredient-Event Relationship Analysis: Investigate the relationship between the presence of certain ingredients and the likelihood of an adverse event. This will involve statistical analyses or machine learning models to identify the most influential ingredients.

  4. Adverse Event Prediction: Develop a machine learning model that predicts the likelihood of an adverse event based on a drug's ingredients.

  5. Visualization and Communication: Communicate the findings through intuitive, interactive visualizations and dashboards using Tableau. This could involve visualizing the frequency of adverse events, the most risky ingredients, and the performance of the prediction model.

  6. Documentation: Document all the analysis steps, methodologies, results, and conclusions. This includes both technical documentation (code, etc.) and non-technical documentation (interpretations, conclusions, etc.).

Tools and Technologies

  • Data Acquisition and Preprocessing: Python, OpenFDA API
  • Data Analysis and Modeling: Python, Apache Spark, PySpark MLlib
  • Data Visualization: Tableau
  • Version Control: Git, GitHub
  • Documentation: Jupyter Notebook

Getting Started

Instructions on setting up the project, including environment setup, data acquisition, and initial data analysis will be provided here.

Contact Information

This project is maintained by Jason Robinson. For any questions or concerns, please reach out at Email.