ebmdatalab/openprescribing

Data Process - Pipeline | Company Name

Closed this issue · 1 comments

Hi, I want to replicate what you have done, but instead of BQ files, I would like to save parquet files. Is there at any point a documented process of how to just grab and clean the data - I don't care about the website front end side.

Second question I have is, in the data creation stage, do companies' names at anytime appear.

Thanks for this nice website.

Is there at any point a documented process of how to just grab and clean the data

I'm afraid there isn't, no. The current data ingestion process is quite closely tied to BigQuery so it wouldn't be an easy job to switch that out. You're probably best off downloading the raw data directly yourself and using our code as inspiration, rather than attempting to modify OpenPrescribing to not use BigQuery.

do companies' names at anytime appear.

If you mean manufacturer names then yes, those appear in the dm+d (dictionary of medicines and devices) data:
https://openprescribing.net/dmd/vmpp/1200411000001101/