DATA-PIPELINE USING AWS SERVICES

Objective

This project deals with collecting real-time twitter data on COVID-19 topic, processing these tweets after mapping the attributes to the desired columns and then storing in the data for further analysis.

Dependencies

boto3

  pip install boto3

tweepy

  pip install tweepy

AWS CLI

Cloud services used in this project:

Kinesis Firehose
AWS S3
AWS Glue
IAM (For role creation)
CloudWatch

How to use ?

Use tweetercred.py to store twitter developer account credentials.
Use buildFirehose_AWS.py which uses boto3 API to create a data delivery stream using Kinesis Firehose.
Use ingesttwitterdata.py to ingest data to data delivery stream.
Delete data delivery stream after the data has been successfully ingested to S3 bucket.
To set the credentials for AWS account run the below command on command prompt after installing AWS CLI.

Architecture

Data pipeline has two parts :

Collecting Data
Processing Data

gchatterjee-git/Data-Pipeline-AWS

DATA-PIPELINE USING AWS SERVICES

Objective

Dependencies

How to use ?

Architecture