The aim of the project is to analyse the movies data from multiple sources such as IMDB MoviesLens, The Numbers and BoxOffice Mojo.com based on movies/cast/box office revenues, movie brands and franchises and perform ETL processes using Talend.
- ER/ Studio
- SQL server Developer Edition
- Microsoft SQL server Management Studio
- Talend Real-Time Data Platform 7.1
- Tableau Desktop
- Microsoft PowerBI
- https://datasets.imdbws.com/
- https://www.boxofficemojo.com/franchise/?ref_=bo_nb_fr_secondarytab
- https://www.boxofficemojo.com/brand/?ref_=bo_nb_frs_secondarytab
- https://grouplens.org/datasets/movielens/25m/
- https://www.the-numbers.com/movies/franchises
- https://www.the-numbers.com/movies/franchise/Marvel-Cinematic-Universe#tab=summary
- https://www.the-numbers.com/movie/Avengers-The-(2012)#tab=box-office
The Number - stage tables.sql
stg imdb tables - core tables.sql
stg imdb tables expanded part 2.sql
stg_ml_tables.sql
When the connections are successful run jobs.
Refer to Tableau workbook for checking visualizations and new use cases will be added soon. Microsoft PowerBI file to be added soon.
- https://elearning.tableau.com/
- https://help.talend.com/reader/KxVIhxtXBBFymmkkWJ~O4Q/8RlpZdAdKhP0IaMHXRV7yw
- https://www.talend.com/
- https://grouplens.org/datasets/movielens/
Please feel free to reach out to ashmitan20@gmail.com for any questions or any changes you propose.