Refine data pipeline and etl robustness

Question

Closed this issue 3 years ago · 0 comments

Should include:

Maybe some more structural refinements
Linting
Type hinting
~~Docstrings~~ -> moved to be resolved with: #12
Consistent handling of null/empty/values -> Pandera?
Db schema -> Pandera
Table content and type validation -> Pandera
- Must: Pandera before writing to database (for each table)
- ~~Maybe: Pandera after main transformation pipeline -> will not do (for now)~~
- Maybe: ~~Pandera at the start of transformation pipeline/after extraction~~ -> will not do (for now)
Consistent capitalization/values naming -> Some edge cases are problematic, but will leave them for now