ncats/stitcher

SPL repository is wonky

Closed this issue · 2 comments

I tried to update data from current SPL repository at dailymed.

They've now broken _rx and _otc into several downloads.

There is now a new process to mark some labels by the FDA as 'inactivated'.
https://www.fda.gov/drugs/drug-approvals-and-databases/ndc-package-file-definitions
NDC_Exclude_Flag Text/String.
Values = ‘Y’, ‘N’, ‘E’, or ‘I’. This indicates whether the PACKAGE has been removed/excluded from the NDC Directory for failure to respond to FDA’s requests for correction to deficient or non-compliant submissions (‘Y’), or because the listing certification is expired (‘E’), or because the listing data was inactivated by FDA (‘I’). The PACKAGE.XLS and PACKAGE.TXT files only contain listing records where NDC_EXCLUDE_FLAG=’N’. The PACKAGES_EXCLUDED.XLS and PACKAGES_EXCLUDED.TXT file contains all listing records with an NDC_EXCLUDE_FLAG of ‘Y’, ‘E’, and ‘I’.

Not all currently marketed product labels download as currently available files, for reasons I do not understand.
e.g., 6W15Z5R0RU https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=bb5a5043-0f51-11df-8a39-0800200c9a66

Products that have ceased marketing are no longer available for download. They are available through the dailymed archive, but there is no download all archive labels available.
https://www.fiercepharma.com/special-report/lartruvo-eli-lilly-top-10-drug-launch-disasters
https://dailymed.nlm.nih.gov/dailymed/search.cfm?labeltype=all&query=OLARATUMAB&pagesize=20&page=1
https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=6a5bff43-f922-46ae-a727-d54d9138c46e
Marketing status is not properly marked in Drugs@FDA https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=overview.process&varApplNo=761038
It is no longer available in NDC either https://www.accessdata.fda.gov/scripts/cder/ndc/dsp_searchresult.cfm Search Results: 'olaratumab' Your search returned no records.
https://drugs.ncats.io/drug/TT6HN20MVF

This means that we need to preserve a historical file of past SPLs that we've processed, replace those with the most recent, where available, and otherwise supplement with old NDCs.

I've preserved some historical SPL and NDC files in stitcher-rawinputs/files/spl-ndc
I now use these files with /scripts/dailymed/dailymed_merge_ndc.py to produce /data/spl_summary.txt that includes OTC, RX, and others in one file. /data/conf/dailymed_summary.conf uses this file to load all of the needed SPL info now.
See 0a5cf17

Requested #172