ivansabik/panditas

Data pipelines using Pandas

PythonApache-2.0

Panditas

Build Data Pipelines using Pandas and S3. Initially this will support the following executors:

Models

Data Flow
Data Flow Steps
- Data Set
- Data Transformation, available transformations include:
  - Calculated Column
  - Columns Subset
  - Conditional Fill
  - Constant Column
  - Filter
  - Format Columns (Currency, Date, etc)
  - Pivot Table (for grouping by)
  - Remove Duplicates
  - Rename Column
  - Replace Text
  - Sort by Columns
  - Value Mapper
- Single Merge Rule
- Multiple Merge Rule

Example

This is a sample dataflow from the insurance industry, implemented here:

Credits

"gummy bear" icon by emilegraphics from the Noun Project.