πFlow is an easy to use, powerful big data pipeline system.
- Features
- Architecture
- Requirements
- Getting Started
- PiFlow Docker
- Use Interface
- Principled Stand
- Contact Us
- Easy to use
- provide a WYSIWYG web interface to configure data flow
- monitor data flow status
- check the logs of data flow
- provide checkpoints
- Strong scalability:
- Support customized development of data processing components
- Superior performance
- based on distributed computing engine Spark
- Powerful
- 100+ data processing components available
- include Spark、MLlib、Hadoop、Hive、HBase、TDengine、OceanBase、openLooKeng、TiDB、Solr、Redis、Memcache、Elasticsearch、JDBC、MongoDB、HTTP、FTP、XML、CSV、JSON,etc.
- JDK 1.8
- Scala-2.12.18
- Apache Maven 3.1.0 or newer
- Spark-3.4.0
- Hadoop-3.3.0
Compatible with X86 architecture and ARM architecture, Support CentOS and Kirin system deployment
-
install external package
mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/spark-xml_2.11-0.4.2.jar -DgroupId=com.databricks -DartifactId=spark-xml_2.11 -Dversion=0.4.2 -Dpackaging=jar mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/java_memcached-release_2.6.6.jar -DgroupId=com.memcached -DartifactId=java_memcached-release -Dversion=2.6.6 -Dpackaging=jar mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/ojdbc6-11.2.0.3.jar -DgroupId=oracle -DartifactId=ojdbc6 -Dversion=11.2.0.3 -Dpackaging=jar mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/edtftpj.jar -DgroupId=ftpClient -DartifactId=edtftp -Dversion=1.0.0 -Dpackaging=jar
-
mvn clean package -Dmaven.test.skip=true
[INFO] Replacing original artifact with shaded artifact. [INFO] Reactor Summary: [INFO] [INFO] piflow-project ..................................... SUCCESS [ 4.369 s] [INFO] piflow-core ........................................ SUCCESS [01:23 min] [INFO] piflow-configure ................................... SUCCESS [ 12.418 s] [INFO] piflow-bundle ...................................... SUCCESS [02:15 min] [INFO] piflow-server ...................................... SUCCESS [02:05 min] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 06:01 min [INFO] Finished at: 2020-05-21T15:22:58+08:00 [INFO] Final Memory: 118M/691M [INFO] ------------------------------------------------------------------------
-
run piflow server on Intellij
:-
download piflow: git clone https://github.com/cas-bigdatalab/piflow.git
-
import piflow into Intellij
-
edit config.properties file
-
build piflow to generate piflow jar:
- Edit Configurations --> Add New Configuration --> Maven
- Name: package
- Command line: clean package -Dmaven.test.skip=true -X
- run 'package' (piflow jar file will be built in ../piflow/piflow-server/target/piflow-server-0.9.jar)
-
run HttpService:
- Edit Configurations --> Add New Configuration --> Application
- Name: HttpService
- Main class : cn.piflow.api.Main
- Environment Variable: SPARK_HOME=/opt/spark-2.2.0-bin-hadoop2.6(change the path to your spark home)
- run 'HttpService'
-
test HttpService:
- run /../piflow/piflow-server/src/main/scala/cn/piflow/api/HTTPClientStartMockDataFlow.scala
- change the piflow server ip and port to your configure
-
-
run piflow server by release version
:-
download piflow.tar.gz:
https://github.com/cas-bigdatalab/piflow/releases/download/v1.2/piflow-server-v1.5.tar.gz -
unzip piflow.tar.gz:
tar -zxvf piflow.tar.gz -
edit config.properties
-
run start.sh、stop.sh、 restart.sh、 status.sh
-
test piflow server
- set PIFLOW_HOME
-
vim /etc/profile
export PIFLOW_HOME=/yourPiflowPath/bin
export PATH=$PATH:$PIFLOW_HOME/bin -
command
piflow flow start example/mockDataFlow.json
piflow flow stop appID
piflow flow info appID
piflow flow log appIDpiflow flowGroup start example/mockDataGroup.json
piflow flowGroup stop groupId
piflow flowGroup info groupId
-
- set PIFLOW_HOME
-
-
how to configure config.properties
#spark and yarn config spark.master=yarn spark.deploy.mode=cluster #hdfs default file system fs.defaultFS=hdfs://10.0.86.191:9000 #yarn resourcemanager.hostname yarn.resourcemanager.hostname=10.0.86.191 #if you want to use hive, set hive metastore uris #hive.metastore.uris=thrift://10.0.88.71:9083 #show data in log, set 0 if you do not want to show data in logs data.show=10 #server port server.port=8002 #h2db port h2.port=50002 #If you want to upload python stop,please set hdfs configs #example hdfs.cluster=hostname:hostIP #hdfs.cluster=master:127.0.0.1 #hdfs.web.url=master:50070
- Visit address, download the corresponding *.tar.gz file, and modify the corresponding configuration file(
The version must be consistent with piflow-server
) - If you want to upload python stops, please modify docker.service
vim /usr/lib/systemd/system/docker.service
ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2375 -H unix://var/run/docker.sock
systemctl daemon-reload
systemctl restart docker
-
flow json
flow example
{ "flow": { "name": "MockData", "executorMemory": "1g", "executorNumber": "1", "uuid": "8a80d63f720cdd2301723b7461d92600", "paths": [ { "inport": "", "from": "MockData", "to": "ShowData", "outport": "" } ], "executorCores": "1", "driverMemory": "1g", "stops": [ { "name": "MockData", "bundle": "cn.piflow.bundle.common.MockData", "uuid": "8a80d63f720cdd2301723b7461d92604", "properties": { "schema": "title:String, author:String, age:Int", "count": "10" }, "customizedProperties": {
} }, { "name": "ShowData", "bundle": "cn.piflow.bundle.external.ShowData", "uuid": "8a80d63f720cdd2301723b7461d92602", "properties": { "showNumber": "5" }, "customizedProperties": { } }
] } }
-
CURL POST:
- curl -0 -X POST http://10.0.86.191:8002/flow/start -H "Content-type: application/json" -d 'this is your flow json'
-
Command line:
-
set PIFLOW_HOME
vim /etc/profile
export PIFLOW_HOME=/yourPiflowPath/piflow-bin
export PATH=$PATH:$PIFLOW_HOME/bin -
command example
piflow flow start yourFlow.json
piflow flow stop appID
piflow flow info appID
piflow flow log appIDpiflow flowGroup start yourFlowGroup.json
piflow flowGroup stop groupId
piflow flowGroup info groupId
-
-
pull piflow images
docker pull registry.cn-hangzhou.aliyuncs.com/cnic_piflow/piflow:v1.5 -
show docker images
docker images -
run a container with piflow imageID , all services run automatically. Please Set HOST_IP and some docker configs.
docker run -h master -itd --env HOST_IP=*.*.*.* --name piflow-v1.5 -p 6001:6001 -v /usr/bin/docker:/usr/bin/docker -v /var/run/docker.sock:/var/run/docker.sock --add-host docker.host:*.*.*.* [imageID] -
please visit "HOST_IP:6001", it may take a while
-
if somethings goes wrong, all the application are in /opt folder
-
Login
: -
Dashboard
: -
Flow list
: -
Create flow
: -
Configure flow
: -
Load flow
: -
Monitor flow
: -
Flow logs
: -
Group list
: -
Configure group
: -
Monitor group
: -
Process List
: -
Template List
: -
DataSource List
: -
Schedule List
: -
StopHub List
:
-
Name:吴老师
-
Mobile Phone:18910263390
-
WeChat:18910263390
-
Email: wzs@cnic.cn
-
QQ Group:1003489545
-
Private vulnerability contact information:ygang@cnic.cn