Visual Flow is an ETL tool designed for effective data manipulation via convenient and user-friendly interface. The tool has the following capabilities:
- Can integrate data from heterogeneous sources:
- AWS S3
- Cassandra
- Click House
- DB2
- Dataframe (for reading)
- Elastic Search
- IBM COS
- Kafka
- Local File
- MS SQL
- Mongo
- MySQL/Maria
- Oracle
- PostgreSQL
- Redis
- Redshift
- Leverage direct connectivity to enterprise applications as sources and targets
- Perform data processing and transformation
- Run custom code
- Leverage metadata for analysis and maintenance
Visual Flow application is divided into the following repositories:
- Visual-Flow-frontend
- Visual-Flow-backend
- Visual-Flow-jobs (current)
- Visual-Flow-deploy
- Visual-Flow-backend-db-service
- Visual-Flow-backend-history-service
This repository contains two types of jobs:
- Slack job - represents the notification stage in Visual Flow pipeline entity and is written in Python.
- Spark job - represents the Visual Flow job entity and is written in Scala.
As Visual flow jobs consist of different stages, below is the list of stages that are supported by spark-job:
- Read stage.
- Write stage.
- Group By stage.
- Remove duplicates stage.
- Filter stage.
- Transformer stage.
- Join stage.
- Change data capture stage.
- Union stage.
For more information on stages check stage_fields.md
Visual flow is an open-source software licensed under the Apache-2.0 license.