PyFlink-DS

Python Flink (DataStream)

Team members names

Rohitha Reddy Meda
Sowmya Thogiti
Vigneshwar Reddy Lenkala

Sowmya Thogiti

Demonstration skill:

To read a collection of numbers from 0 to 9, filter the even numbers and multiply them by 10 and finally store them in another collection using PyFlink DataStream.

Prerequisites:

Apache Flink Installed (2.12 or 2.11)
IntelliJ IDE installed from source
PIP installed
Python 3.7 version installed and set in path of environment variables
pyflink installed source

Introduction to DataStream API:

Apache Flink offers a DataStream API for building robust, stateful streaming applications.
It provides fine-grained control over state and time, which allows for the implementation of advanced event-driven systems
Flink is considered as the next-gen stream processing system.
Flink offers substantially higher processing speeds to spark and hadoop.
Flink provides low latency and high throughput

How to download Flink:

Check the versions of pip and python in terminal of IntelliJ IDE using:
- pip --version
- python --version
Once the python is of version 3.7.0, use below command to run in terminal opened in IntelliJ IDE using:
- pip install apache-flink
Install pyflink using below command in terminal: pip install pyflink

How to setup pyflink:

open IntelliJ IDE and open the installed path of pyflink from file select.
Build project from build menu.
Run whole project and wait for all the dependencies installed into your local machine.
Start working on pyflink projects.

Flink set-up using python:

Create a new file using File\New\Python file
Provide a new name for the file such as task1.py
In order to configure python interpreter, we need to select File\Project Structure
in Project structure, under project settings - project SDK list - select Add sdk list - select python SDK.
Under Add python SDk Interpreter - select Virtualenv Environment - Base interpreter--select the path of python 3.7(python.exe).
Click OK
Click apply and then OK again.
After applying SDK changes, venv folder is created by default which stores all respective libraries and scripts.

Source code of above topic:

Output:

References:

Demonstration video link:

Sowmya Demo video

Vigneshwar Reddy Lenkala

Demonstration skill:

To read a collection of words, calculate the length of each and return a file with each line containing the word and its length using PyFlink DataStream.

Prerequisites:

Apache Flink Installed (2.12 or 2.11)
PIP installed
IntelliJ IDE installed
Python 3.7 version installed

Introduction to DataStream API

Apache Flink is an Open source stream processing framework for distributed, high performance data streaming application.
Apache Flink offers a DataStream API for building robust, stateful streaming applications.
It provides fine-grained control over state and time, which allows for the implementation of advanced event-driven systems.
Flink also provides batch processing, graph processing, Itearative proccessing for Machine learning applications.
Flink is considered as the next-gen stream processing system.
Flink offers substantially higher processing speeds to spark and hadoop.
Flink provides low latency and high throughput

Installation steps of PyFlink

Python version (3.5, 3.6, 3.7 or 3.8) is required for PyFlink. Please run the following command to make sure that it meets the requirements:

$ python --version

Check the version of pip using command:

$ pip --version

Use the below command to install apache-flink 1.12.0

$ python -m pip install apache-flink 1.12.0

Code snippet:

Output snippet:

Demonstration video link:

Vignesh Demo Video

References:

Rohitha Reddy Meda

Demonstration skill:

To read a file containing sentences in each line, filter lines containing a certain word and write them to a new file

Prerequisites:

Apache Flink Installed (2.12 or 2.11)
PIP installed
IntelliJ IDE installed
Python 3.7 version installed and set in path of environment variables

Introduction to DataStream API:

To develop stable, stateful streaming applications, Apache Flink offers a DataStream API.
It provides fine-grained state and time control that enables advanced event-driven systems to be introduced.
Flink is known as the delivery system for next-gen streams.
For spark and hadoop, Flink provides significantly higher processing speeds.
Flink provides low latency and high throughput.

Flink set-up using python:

Create a new file using File\New\Python file
Provide a new name for the file such as project1.py
We must pick File\Project Structure to configure the python interpreter.
Under Project Structure, under Project Settings - Project SDK List - choice Add SDK List - choose Python SDK.
Under Add python SDk Interpreter - choose Virtualenv Environment - Base Interpreter—choose python 3.7 road (python.exe).
Click OK
Click apply and then OK again.
The env folder is generated by default after applying SDK changes, which stores all the respective libraries and scripts.

sowmyathogiti/PyFlink-DS

PyFlink-DS

Team members names

Sowmya Thogiti

Demonstration skill:

Prerequisites:

Introduction to DataStream API:

How to download Flink:

How to setup pyflink:

Flink set-up using python:

Source code of above topic:

Output:

References:

Demonstration video link:

Vigneshwar Reddy Lenkala

Demonstration skill:

Prerequisites:

Introduction to DataStream API

Installation steps of PyFlink

Code snippet:

Output snippet:

Demonstration video link:

References:

Rohitha Reddy Meda

Demonstration skill:

Prerequisites:

Introduction to DataStream API:

Flink set-up using python:

Code for my topic

Input and Output snip

Demonstration video link:

References: