/fda-biomarker

FDA PGx Biomarker Table parser

Primary LanguageJavaScript

FDA Table of Pharmacogenomic Biomarkers in Drug Labeling

DEPRECATED: THESE SCRIPTS HAVE BEEN MOVED TO THE MAIN PharmGKB REPO AND WILL NO LONGER BE MAINTAINED

This code scrapes content from the "Table of Pharmacogenomic Biomarkers in Drug Labeling" page and the "Table of Pharmacogenetic Associations" page on the FDA website. Specifically, this will transform the content of the tables on those pages into JSON files for better computational use.

CAUTION: This data file strips out all footnotes and contextual information about the contents of the Biomarkers table. Go read the original source pages before attempting to use this data.

The text is copied verbatim from the HTML source with the following exceptions:

  1. Footnote glyphs are removed from field titles
  2. redundant whitespace (spaces, newlines, tabs) are replaced with a single space in field values

Setup

Make sure you have Node.js and NPM installed. Download dependencies with the following command:

npm i

Additionally, if you make a .env file with SLACK_URL specified with a Slack webhook URL then this will post result messages to Slack. If it's not specified then it will just post to console.

Running

To run the script:

node --harmony biomarker.js

This will store a timestamped JSON file with an accompanying MD5 hash of that file to an out directory.