Refine data pipeline and etl robustness
Closed this issue · 0 comments
maciejzj commented
Should include:
- Maybe some more structural refinements
- Linting
- Type hinting
-
Docstrings-> moved to be resolved with: #12 - Consistent handling of null/empty/values -> Pandera?
- Db schema -> Pandera
- Table content and type validation -> Pandera
- Must: Pandera before writing to database (for each table)
-
Maybe: Pandera after main transformation pipeline -> will not do (for now) - Maybe:
Pandera at the start of transformation pipeline/after extraction-> will not do (for now)
- Consistent capitalization/values naming -> Some edge cases are problematic, but will leave them for now