A scrub system for de-identification and cleaning of data to maintain the privacy of data when sharing it with other organizations. Here, we are focusing on the medical dataset as it is quite vulnerable to data leakage. But this algorithm can be applied to any dataset to ensure its privacy.
python main.py -f Input_files/records.csv -o output_file_name
-f, --input-file-path: Input file path
-o, --output-file-name: Output file name
Note: 3 inputs are taken from the user as highlighted in the above image. Based on these inputs, the decision is formed and the output is shown.
Check out the complete demo with explanation here
A medical open-source dataset named "Electronic Health Record (EHR) Incentive Program Payments for Eligible Providers" taken from here