jes2ica/movie_analysis

Scala

Movie Analysis

This repository includes the Scala project of data preprocessing.

DataSet

Kaggle “Movies Dataset” (use these files only: movies_metadata.csv, credits.csv, ratings.csv).

Implementation Details

Construct map from csv file using Spark.
Intepret json fields.
Build 4 csv files:
casts.csv (represents a cast, with label Talent)
crews.csv (represents a crew, with label Talent)
movies.csv (represent a movie, with labal Movie)
talent_movie_rel.csv (represent the relationship between talent and movie).