Github repository dedicated to the automated daily retrieval of all CMS Provider Data Catalog (PDC) datasets, accompanied by storage handling through DoltHub.
Visit the project's Dolt repo: CMS PDC on DoltHub
License: AGPL-3.0
This repository performs daily GET requests to fetch datasets from the CMS Provider Data Catalog (PDC) and stores them systematically. The main objective is to maintain an up-to-date and accessible repository of CMS datasets that are crucial for healthcare analytics and public health informatics.
The CMS PDC covers various healthcare-related themes. Below are some of the key data themes available:
- Dialysis Facilities (DF)
- Doctors and Clinicians
- Home Health Services
- Hospice Care
- Hospitals
- Inpatient Rehabilitation Facilities
- Long-term Care Hospitals
- Nursing Homes Including Rehab Services
- Physician Office Visit Costs
- Supplier Directory (SD)
The project utilizes Python scripts scheduled via crontab (or your custom scheduler) to pull data from the CMS API using specific dataset identifiers located in config/datasets.yml
. The datasets are downloaded in CSV format and stored in a directory structure reflecting their respective themes, ensuring easy navigation and access.
- download_datasets.py: Main Python script that orchestrates the downloading process.
- config/datasets.yml: YAML file containing dataset identifiers and themes.
- data/: Directory where downloaded datasets are stored by theme and dataset ID.
To run the project scripts or contribute, you need:
- Python 3.x
- Dependencies from
requirements.txt
-
Clone the repository:
git clone https://github.com/<your-github>/cms-pdc.git cd cms-pdc
-
Install dependencies:
pip install -r requirements.txt
-
Set up the scheduler for daily runs or execute the script manually:
python3 hippo/download_datasets.py
Contributions to enhance the functionality, improve data extraction, or refine storage mechanisms are welcome! Please fork the repository, make your changes, and submit a pull request.
Please ensure that the use of data fetched through CMS PDC is compliant with the data use agreements and legal stipulations provided on the CMS Data Website.