maciejzj/it-jobs-meta

Refine data pipeline and etl robustness

Closed this issue · 0 comments

Should include:

  • Maybe some more structural refinements
  • Linting
  • Type hinting
  • Docstrings -> moved to be resolved with: #12
  • Consistent handling of null/empty/values -> Pandera?
  • Db schema -> Pandera
  • Table content and type validation -> Pandera
    • Must: Pandera before writing to database (for each table)
    • Maybe: Pandera after main transformation pipeline -> will not do (for now)
    • Maybe: Pandera at the start of transformation pipeline/after extraction -> will not do (for now)
  • Consistent capitalization/values naming -> Some edge cases are problematic, but will leave them for now