ETL With Twitter Data

Overview

This project demonstrates how to work with the Twitter API in python. Using the Tweepy library, you can scrap data from Twitter. The project also shows how to Extract, Transform and Load data into a CSV file and a MongoDB database.

Task

Part 1

Write a script that downloads tweets data on a specific search topic using the standard search API. The script should contain the following functions:

scrape_tweets() that has the following parameters:
- Search topic
- The number of tweets to download per request
- The number of requests
and returns a dataframe.
Save_results_as_csv() that has the following parameters:
1. the dataframe from the above function
  And returns a csv file with the following naming format:
tweets_downloaded_yymmdd_hhmmss.csv (where ‘yymmdd_hhmmss’ is the current timestamp)

The following attributes of the tweets should be extracted:

Tweet text
Tweet id
Source
Coordinates
Retweet count
Likes count
User info
- Username
- Screenname
- Location
- Friends count
- Verification status
- Description
- Followers count

Make sure to not include retweets.
Make sure you the same tweets appearing multiple times in your final csv.

Part 2

Create a MongoDB database called Tweets_db and store the extracted tweets into a collection named: raw_tweets.

Pre-requisites

Twitter Developer Account
Apply for a Twitter Developer account if you do not have one. You would need the credentials for working with the Twitter API.
Twitter API credentials

Getting Started

The project was developed using:

Python 3.7.9
Anaconda (conda)
Tweepy
Pymongo
Pandas

Follow the steps below to setup the project.

Create environment

Create a conda environment using the command:

conda create -n "env-name" python=3.7

Activate environment

Activate the environment using the command:

conda activate env-name

Install packages

Install project packages using the command:

pip install -r requirements.txt

Store env variables

To store your access credentials (examples: API keys, Database access credentials), follow the steps below:

Duplicate .env.example file and create a new file names .env
Store your access credentials as needed

kingsabru/twitter-etl-airflow-mongodb