Movie Analysis
This repository includes the Scala project of data preprocessing.
DataSet
Kaggle “Movies Dataset” (use these files only: movies_metadata.csv, credits.csv, ratings.csv).
Implementation Details
- Construct map from csv file using Spark.
- Intepret json fields.
- Build 4 csv files:
- casts.csv (represents a cast, with label Talent)
- crews.csv (represents a crew, with label Talent)
- movies.csv (represent a movie, with labal Movie)
- talent_movie_rel.csv (represent the relationship between talent and movie).