DEPRECATED: THESE SCRIPTS HAVE BEEN MOVED TO THE MAIN PharmGKB REPO AND WILL NO LONGER BE MAINTAINED
This code scrapes content from the "Table of Pharmacogenomic Biomarkers in Drug Labeling" page and the "Table of Pharmacogenetic Associations" page on the FDA website. Specifically, this will transform the content of the tables on those pages into JSON files for better computational use.
CAUTION: This data file strips out all footnotes and contextual information about the contents of the Biomarkers table. Go read the original source pages before attempting to use this data.
The text is copied verbatim from the HTML source with the following exceptions:
- Footnote glyphs are removed from field titles
- redundant whitespace (spaces, newlines, tabs) are replaced with a single space in field values
Make sure you have Node.js and NPM installed. Download dependencies with the following command:
npm i
Additionally, if you make a .env
file with SLACK_URL
specified with a Slack webhook URL then this will post result messages to Slack. If it's not specified then it will just post to console.
To run the script:
node --harmony biomarker.js
This will store a timestamped JSON file with an accompanying MD5 hash of that file to an out
directory.