Hatch 2021 Challenge 3 Data Normalization Tools

The data for challenge 3 was delivered in some crazy format, so I created some tools to convert it to JSON, which is easier for computers to parse. The result is one JSON document per line which represents a patient. The file itself is probably not valid JSON, heh

JSON data can be found here:

./data/DomoArigatoData.json

Install

If running locally install all of the required python packages

pip install --no-cache-dir -r requirements.txt

Usage

To JSONify data run the following:

./jsonify_data.py > data/DomoArigatoData.json

Optional Commands

Fix file encoding and output results to ./data/DomoArigatoData-utf8.txt

./fix_encoding.py

To run the jsonify_data.py command, but in docker:

./docker-run.sh > data/DomoArigatoData.json

Notes

  • I had to fix a couple issues with the data by deleting one row of data and fixing a character with another
  • Yeah, I know it's still not valid JSON, but each row DOES contain a valid JSON doc.
  • PRs are welcome, if you find a mistake