The Python script upload_plan_finder_data.py
can be used to perform automated daily updates to CMS Plan Finder data on Socrata.
The CMS Plan Finder data can be found on this Socrata domain:
https://plan-finder.data.medicare.gov
This file contains documentation on technical requirements for running the script as well as a quick guide to scheduling the script for data automation.
This script requires a recent version of Python (3.6+), as well as some external dependencies which can be installed using the requirements.txt file like so:
pip3 install -r requirements.txt
Additionally, the script uses a number of credentials which must be securely stored using environment variables. (It's best to avoid inserting private credentials in the script itself.) Here are the required environment variables:
# Required to obtain access token from the HPMS token services
SOCRATA_PLAN_FINDER_TEST_USERNAME
SOCRATA_PLAN_FINDER_IMPL_USERNAME
SOCRATA_PLAN_FINDER_PROD_USERNAME
SOCRATA_PLAN_FINDER_TEST_KEY_ID
SOCRATA_PLAN_FINDER_TEST_KEY_SECRET
SOCRATA_PLAN_FINDER_IMPL_KEY_ID
SOCRATA_PLAN_FINDER_IMPL_KEY_SECRET
SOCRATA_PLAN_FINDER_PROD_KEY_ID
SOCRATA_PLAN_FINDER_PROD_KEY_SECRET
SOCRATA_PLAN_FINDER_TEST_ACS
SOCRATA_PLAN_FINDER_IMPL_ACS
# Required to log into the Socrata domain and update datasets
SOCRATA_KEY_ID
SOCRATA_KEY_SECRET
The script also depends on a few local files to understand which encodings, schemas (column names), and Socrata dataset IDs to use for which Plan Finder data files. Do not modify or delete these unless you know what you're doing:
tables/files.csv
tables/schemas.csv
tracker.json
To automate Plan Finder data using this script, use a scheduling service such as Windows Task Scheduler (on Windows) or cron (on Linux). Here's the command to automate:
python3 upload_plan_finder_data.py
Since the Plan Finder data is released daily, schedule the script to run no more frequently than once a day. The script will attempt to download a file using the current day's date (local time), so be careful about scheduling it to run in the early morning, in case the new data hasn't yet been released.
Each time it's run, the script executes the following steps:
- Load the local tables
tables/files.csv
,tables/schemas.csv
, andtracker.json
. These files contain the file encodings, schemas, and Socrata dataset IDs required to automate the Plan Finder data. - Fetch an HPMS access token for the
test
environment. - Create a temporary directory,
plan_finder_data
, for storing today's data files. - Fetch the Plan Finder zip file for the current date in the HPMS
test
environment. - Extract the contents of the zip file in the
plan_finder_data
directory. - For each extracted data file…
- Identify an encoding, schema, and Socrata dataset ID from the local tables loaded in step 1. (If these are not found, skip the remaining steps.)
- Cleanup: Check for old draft updates to this dataset on Socrata that were created more than 1 day ago but never successfully finished publishing. Discard these hanging drafts.
- Load the data file using the specified encoding and schema (column names).
- Using the specified Socrata dataset ID, upload the new data file, fully replacing the rows in the existing dataset.
- If any updates from step 6 failed, log a summary of the failures.
- Delete the temporary directory
plan_finder_data
that was created in step 3. - Repeat steps 1-8 for the
impl
andprod
environments.
The script supports a few command line options. Here's the full command line interface (you can also get this by executing python3 upload_plan_finder_data.py --help
):
python3 upload_plan_finder_data.py [-h] [--date YYYY-MM-DD]
[--env {test,impl,prod}]
[--only-file FILENAME]
[--create-untracked-datasets]
A script to automate updates to CMS Plan Finder data on Socrata.
Requires a number of environment variables to be set prior to script
execution. See the README.md file for more information.
optional arguments:
-h, --help show this help message and exit
--date YYYY-MM-DD date to retrieve, in YYYY-MM-DD format; defaults to today: 2020-07-30
--env {test,impl,prod}
HPMS environment to use (test, impl, or prod); defaults to all three
--only-file FILENAME filename to update; if specified, only that file (and no others) will be updated
--create-untracked-datasets
create any datasets listed in CSV tables but not currently tracked in tracker.json; if this flag is set, the script will only create the untracked datasets, but not perform any other updates
The script will log its output to the console, as well as to a local file, socrata.log
. These logs are the best place to start if you're troubleshooting an error.
Let's say we want to update all datasets with the data released on a particular date, such as April 24, 2020. We can do so using the --date
option:
python3 upload_plan_finder_data.py --date 2020-04-24
What if we only want to update datasets for the test
environment, but not impl
or prod
? We can use the --env
option:
python3 upload_plan_finder_data.py --env test
In this example, we want to update only the datasets associated with the Plan Finder data file PLN_BNFT_PKG_CST.TXT
, but not any others. We can do this using the --only-file
option:
python3 upload_plan_finder_data.py --only-file PLN_BNFT_PKG_CST.TXT
We can also combine options to update only the dataset for a specific file, date, and environment:
python3 upload_plan_finder_data.py --only-file PLN_BNFT_PKG_CST.TXT --date 2020-04-24 --env test
What if a new data file becomes available in the HPMS release package? How do we add it to our automation?
Because Plan Finder data files do not include column names (schemas), some manual configuration is required. Follow these steps:
-
Add row to files table: Add a comma-delimited row to
tables/files.csv
containing the new file's name and encoding. (The encoding must be a valid Python-readable encoding string such asutf-16-le
orascii
.) This ensures that the script will find your file and open it using the correct encoding. -
Add row to schemas table: Add comma-delimited rows to
tables/schemas.csv
containing the file's name and column indices and names. The number and order of columns in the schemas table must match the data file exactly. -
Create untracked datasets on Socrata: Open a terminal and execute the script with the
--create-untracked-datasets
option:python3 upload_plan_finder_data.py --create-untracked-datasets
. -
Check the Plan Finder domain: If the script executes successfully, your new dataset(s) should be created on the Plan Finder domain. You can find their unique Socrata identifiers by inspecting the
tracker.json
file. (As mentioned above,tracker.json
is the file that links Plan Finder data files to their respective Socrata datasets.) Congratulations! Your new data file(s) are now tracked by the script and will be automatically updated when the script runs each day.