Build Data Pipelines using Pandas and S3. Initially this will support the following executors:
- Data Flow
- Data Flow Steps
- Data Set
- Data Transformation, available transformations include:
- Calculated Column
- Columns Subset
- Conditional Fill
- Constant Column
- Filter
- Format Columns (Currency, Date, etc)
- Pivot Table (for grouping by)
- Remove Duplicates
- Rename Column
- Replace Text
- Sort by Columns
- Value Mapper
- Single Merge Rule
- Multiple Merge Rule
This is a sample dataflow from the insurance industry, implemented here:
"gummy bear" icon by emilegraphics from the Noun Project.