This project is basically for collecting enormous data and analysing it. It includes live streaming of data from FOREX trading API and Electric Vehicle stocks API. The data is fetched and processed using REST API, Kafka Streaming and Spark streaming.Troughout this project stocks of Forex data and Electric Vehicle parts making companies data were analysed, and business use cases were implemented. The analysed data is then visualized by plotting different graphs using python libraries.
- Python 3.9
- Kafka 2.8.0
- Spark 3.1.2
- Pyspark
- kafka-python
- matplotlib
- Analysed ask price and bid price of our Forex data.
- Showing variation of volume of share has sold with respect to ‘Date’.
- Showing variation in opening and closing of shares with respect to ‘Date’.
GitHub clone URL: https://github.com/rajib1007/Project_3.git
INSTALL REQUIRED TOOLS TO PROCEED WORK & SET PATH
sudo apt-get install openjdk-8-jdk
wget https://dlcdn.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
tar xvzf spark-3.1.2-bin-hadoop3.2.tgz
wget https://archive.apache.org/dist/kafka/2.0.0/kafka_2.11-2.0.0.tgz
tar xvzf kafka_2.11-2.0.0.tgz
FOLLOW THIS LINK INSTALL JUPYTER NOTEBOOK ON UBUNTU
GOTO KAFKA DIRECTORY AND RUN ZOOKEEPER AND KAFKA SERVER
cd kafka_2.11-2.0.0/
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
CREATE A TOPIC project3 WITH REPLICATION FACTOR 1 AND PARTITION 1
bin/kafka-topics.sh --create --zookeeper localhost:2181 --topic --replication-factor 1 --partitions 1
CREATE A producer.py FILE TO FETCH DATA FROM API AND PASS TO TOPIC
- CREATE ACCOUNT ON api.tiingo.com
- run producer.py file
START CONSUMER API IN NEW TERMINAL
bin/kafka-console-consumer.sh --topic <topic_name> --bootstrap-server localhost:9092
CREATE ANOTHER .py FILE TO STRUCTURE THE JSON DATA INTO SPARK DATAFRAME AND DO SOME OPERATION OR A USECASE
- CREATE TOPIC askPriceOutput
bin/kafka-console-consumer.sh –topic <output_topic_file> --bootstrap-server localhost:9092
- usecase2.py
- GOTO SPARK BIN FOLDER
cd /spark-3.1.2-bin-hadoop3.2/bin/
- RUN SPARK-SUBMIT JOB WITH usecase.py FILE AND STORE THE OUTPUT ON askPriceOutput TOPIC
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2 <file_path.py>
OUTPUT WILL BE SHOWING ON askPriceOutput TOPIC’S CONSUMER
OPEN JUPYTER NOTEBOOK AND INSTALL matplotlib library and kafka-python ON IT