BennyThadikaran/eod2

Check for data integrity

BennyThadikaran opened this issue · 0 comments

A user contacted me on Friday with an error while running EOD2 sync. The error seem to point to duplicate entries in one of the csv files.

I couldn't get the source of the error, but i decided to check my own data for any such issues.

  • No duplicate entries found in daily or delivery folders.
  • There was however an extra column DELIV_PER in some delivery files. Source of the error was in src/defs/defs.py in the header_text variable. This defines the column headers for delivery files whenever a new file is created.
    • Code corrections were made and 182 files with this extra column was cleaned up.
    • 5 files with just column headers and no data were deleted.

I have written a script diagnostics.py which will look for common errors. I intend to run this weekly before updating data on the repo.

Users can use diagnostics.py to run checks on their own data and report issues.