Description:
Visualization project including previous EDA & ETL steps.
- Think of some questions you want to find a response to: the average rating for a given genre of movies on IMDB and their description, the average sentiment of a twitter username across time, the reviews for given categories of products are greater than other categories....whatever comes to mind. Make some questions before starting the process
- Find data. This can be: hotel reviews, tweets, lines per character in a TV show, dialogues from a book, whatsapp/telegram/tinder conversations, etc. Whatever, really!
- The data can be scrapped or obtained through an API, but it doesn't need to be. You can also look for csv files.
- Clean and transform your data.
- Load the data you obtain into SQL. Do this through Python if possible. Do you want to make it relational? Do you have things like: users, chats & messages? characters, episodes & seasons? Does it make sense to make it relational? Just one table? Define a DB diagram that makes sense. You may encounter some errors while inserting, as text can be tricky since it takes different formats. You will either have to: not do anything (data will be formatted well enough) or transform it using Python or outside of Python.
- Now you have a clean database. Do queries and subqueries if you can.
- Use another source of data so you can have more detailed analysis. You can just relate it semantically. This will help you generate Tableau visualizations.
- Generate a Tableau dashboard with the insights you found.
# STEPS
-
Extract data
-
Transform it through Python
-
Add another source of info if needed
-
Load that data into SQL through Python & SQL
-
Try to answer as many questions as you can by running SQL queries through Python: aggregations, averages, comparisons, filtered data, etc
-
Export the result of those queries as csv files
-
On Tableau, create stories/dashboards.
-
Include your csv files from the queries & the original csv without it being queried too so you can have more visualizations
- ETL
- EDA
- Visualization using Tableau
-
Python files
- Modularization
- Documented & organized code: docstrings, comments, etc
-
SQL file
- In this file you should gather all the SQL questions that describe your data.
- Description of central tendency of your variables per groups
- Frequencies
- Percentages of a given category
- Group things with conditions
- Same things from another table?
- Be creative! What questions would you have about this data if you were the owner of a company? Write as many meaningful queries as you can!
- In this file you should gather all the SQL questions that describe your data.
-
Link to Tableau dashboard