/Movie-Genre-Prediction-Multi-Label

CSE587 Data Incentive Computing

Primary LanguageJupyter Notebook

Movie-Genre-Prediction-Multi-Label

CSE587 Data Incentive Computing

The objective of the project is to implement a movie genre prediction model using Apache Spark The task of predicting the genre is essentially a multi-label classification problem. A movie can have multiple genres associated with it. Your model should be able to predict all the genre associated with the movie Created a term-document matrix from the plots and use these as feature vectors for the machine learning model, whose macro - f1 score was 0.97 To improve the performance of the model, implemented the Term Frequency-Inverse Document Frequency (TF-IDF) based feature engineering technique whose macro-f1 score was 0.98 To improve the performance of the model again, implement the Word2vec feature engineering technique whose macro-f1 score was 1.0 So, overall we got the improvement in performance by 0.03

Training File(train.csv) - https://drive.google.com/drive/folders/1HWkya3iwCcz7lxx9sgPWrQaBvkaqsMUy?usp=sharing