This repository contains the following files:

Twitter_Files:

get_tweets_apache.py: Run by the tweets dag. This python script gets the tweets.
Add Sentiment, Write to Ware House.py: Run by the sentiment dag. This python script conducts the duplicate check and sentiment analysis and writes the non duplicate tweets with their sentiment to the data warehouse.
average.py: Run by the average dag. This python script creates an aggregated table by city out of the tweets table in the data warehouse.
twitter_dag.py: Script that runs the "get_tweets_apache.py" script.
sentiment_dagt.py: Script that runs the "Add Sentiment, Write to Ware House.py" script.
average_dag.py: Script that runs the "average.py" script.

Weather_Files:

weather_dag.py: combining the two python scripts below to execute them sequentially.
get_weather_apache.py: Contains the code to create a table within the data lake and fetch the weather data to store in a table
send_to_dwh_apache: Containing the code to create a table in the data warehouse and extract data frome the data lake to move it to the warehouse.

scrape_Script_clp.py: scrapes each city and country (in european_data.csv) and for each one it creates one file. E.g. Brussels_Belgium.json, etc.
scrape_all_in_one.py: scrapes each city and country (in european_data.csv) but saves all responses in one json file. E.g. scrape_costlivingprice.json
lambda_function_s3_to_rds.py: imports/inserts the "cost_living_prices_europe_cleanV2.csv" from the s3bucket into the data warehouse
Connect_to_dw.ipynb: connects to data warehouse and prints the header of "costlivingprices" table