- Kaggle: danielgrijalvas/movies
- GitHub: danielgrijalvas/movie-stats
This dataset about the movie industry was scraped from IMDb and published on Kaggle
and GitHub
by danielgrijalvas. The dataset contains 7,668 movies from 1980 to 2020 (7,668 observations and 15 variables).
Click here for more details.
From the beginning of the film industry to the present, many movies from different studios are released every year. Producing and shooting one movie requires a huge budget. In terms of business, if we invest in something, what is expected is "profit," and so is the film industry. However, the difference is in the factors that affect the revenue of movies, such as feedbacks, ratings, reviews, film directors, screenwriters, and many more. We selected this dataset to analyze the factors affecting the profits of individual films, changes in movie budgets, duration of the movie, and the revenue of movies over the past four decades. This dataset is obtained from Kaggle, explored in Microsoft Excel, cleaned and analyzed by R language in R Studio. In addition, the data was analyzed to determine the correlation of the data.
- Define questions
- Explore data from the original dataset
- Data Cleaning and Data Transformation
- Exploratory Data Analysis
- Analytical Inferential Statistics
- Data Visualization
- Define questions
- Exploratory Data
- Cleaning Data and Transformation
- Data Analysis
- Analytical Inferential Statistics
- Data Visualization (BI Dashboard)
- Original Dataset: movies_original.csv
- Cleaned Dataset: movies_cleaned.csv
Folder | Link |
---|---|
Exploratory Dataset | Click here |
Cleaning Dataset | Click here |
Data Analysis | Click here |
Hypothesis testing | Click here |
Data Visualization | Click here |
The work is part of INT214 (Statistics for Information technology)
semester 1/2021 School of Information Technology KMUTT
No. | Name | Student ID |
---|---|---|
1 | Denphum Nakglam | 63130500039 |
2 | Songglod Petchamras | 63130500042 |
3 | Thanakrit Paithun | 63130500046 |
4 | Thanaphon Sukkasem | 63130500048 |
5 | Thanatorn Roswan | 63130500053 |
- ATCHARA TRAN-U-RAIKUL
- JATAWAT XIE (Git: safesit23)