/panditas

Data pipelines using Pandas

Primary LanguagePythonApache License 2.0Apache-2.0

Panditas

CI Buld Status Code style: Black

Build Data Pipelines using Pandas and S3. Initially this will support the following executors:

Models

  • Data Flow
  • Data Flow Steps
    • Data Set
    • Data Transformation, available transformations include:
      • Calculated Column
      • Columns Subset
      • Conditional Fill
      • Constant Column
      • Filter
      • Format Columns (Currency, Date, etc)
      • Pivot Table (for grouping by)
      • Remove Duplicates
      • Rename Column
      • Replace Text
      • Sort by Columns
      • Value Mapper
    • Single Merge Rule
    • Multiple Merge Rule

Example

This is a sample dataflow from the insurance industry, implemented here:

Credits

"gummy bear" icon by emilegraphics from the Noun Project.