Python 3.9
numpy 1.22.2
pandas 1.4
pytest 7.0.0
notebook 6.4.8
... and their dependencies. See requirements.txt for a full list.
- I found that the file had missing values in patient_id, claim_id, procedure_code, and date_service.
- I found that there were multiple invalid cpt and icd codes.
- I found that some rows had invalid service date formatting.
- I found that some rows had service dates that occured after the received date (not logical for medical claims)
- I found that some service dates had dates occuring many years in the future.
However, the file columns were correct and the dtypes were good once the invalid pieces of data were filtered out.