/spring-cloud-dataflow

Spring Cloud Data Flow is a toolkit for building data integration and real-time data processing pipelines.

Primary LanguageJavaApache License 2.0Apache-2.0

Spring Data Flow Dashboard

Latest Release Version Latest Snapshot Version
Build Status

Spring Cloud Data Flow is a toolkit for building data integration and real-time data processing pipelines.

Pipelines consist of Spring Boot apps, built using the Spring Cloud Stream or Spring Cloud Task microservice frameworks.

This makes Spring Cloud Data Flow suitable for a range of data processing use cases, from import/export to event streaming and predictive analytics.


Components

The Core domain module includes the concept of a stream that is a composition of spring-cloud-stream modules in a linear pipeline from a source to a sink, optionally including processor application(s) in between. The domain also includes the concept of a task, which may be any process that does not run indefinitely, including Spring Batch jobs.

The App Registry maintains the set of available apps, and their mappings to URIs. For example, if relying on Maven coordinates, an app's URI would be of the format: maven://<groupId>:<artifactId>:<version>

The Data Flow Server is a Spring Boot application that provides a common REST API and UI. As of version 2.0 a single Data Flow Server supports deploying tasks to Local, Cloud Foundry, and Kubernetes. Also as of version 2.0, the Skipper Server is required for deploying streams to Local, Cloud Foundry and Kubernetes. The github locations for the Data Flow Server is:

There are also community maintained Spring Cloud Data Flow implementations that are currently based on the 1.7.x series of Data Flow but will eventually get updated to the 2.0 baseline.

The Apache YARN implementation has reached end-of-line status. Let us know at Gitter if youare interested in forking the project to continue developing and maintaining it.

The deployer SPI mentioned above is defined within the Spring Cloud Deployer project. That provides an abstraction layer for deploying the apps of a given stream or task and managing their lifecycle. The github locations for the corresponding Spring Cloud Deployer SPI implementations are:

The Shell connects to the Data Flow Server's REST API and supports a DSL that simplifies the process of defining a stream or task and managing its lifecycle.

Instructions for running the Data Flow Server for each runtime environment can be found in their respective github repositories.


Building

Clone the repo and type

$ ./mvnw clean install 

For more information on building, see this link.

Building on Windows

When using Git on Windows to check out the project, it is important to handle line-endings correctly during checkouts. By default Git will change the line-endings during checkout to CRLF. This is, however, not desired for Spring Cloud Data Flow as this may lead to test failures under Windows.

Therefore, please ensure that you set Git property core.autocrlf to false, e.g. using: $ git config core.autocrlf false. Fore more information please refer to the Git documentation, Formatting and Whitespace.


Contributing

We welcome contributions! Follow this link for more information on how to contribute.


Reporting Issues

When reporting problems, it'd be helpful if the bug report includes the details listed on this wiki-article.


Code formatting guidelines

  • The directory ./src/eclipse has two files for use with code formatting, eclipse-code-formatter.xml for the majority of the code formatting rules and eclipse.importorder to order the import statements.

  • In eclipse you import these files by navigating Windows -> Preferences and then the menu items Preferences > Java > Code Style > Formatter and Preferences > Java > Code Style > Organize Imports respectfully.

  • In IntelliJ, install the plugin Eclipse Code Formatter. You can find it by searching the "Browse Repositories" under the plugin option within IntelliJ (Once installed you will need to reboot Intellij for it to take effect). Then navigate to Intellij IDEA > Preferences and select the Eclipse Code Formatter. Select the eclipse-code-formatter.xml file for the field Eclipse Java Formatter config file and the file eclipse.importorder for the field Import order. Enable the Eclipse code formatter by clicking Use the Eclipse code formatter then click the OK button. ** NOTE: If you configure the Eclipse Code Formatter from File > Other Settings > Default Settings it will set this policy across all of your Intellij projects.