Movie Recommendation System

Overview

This repository contains code and data for building a movie recommendation system. I designed the system to recommend movies based on user preferences and movie attributes. In this README, I will provide an overview of the data preprocessing steps and the structure of the code.

Data

The data used in this project consists of two main datasets: credits.csv and movies.csv. Here is some basic information about these datasets:

  • credits.csv: Contains information about the cast and crew of each movie.

    • Shape: (4803, 4)
    • Columns: 'movie_id', 'title', 'cast', 'crew'
  • movies.csv: Contains information about movies, including titles, overviews, genres, keywords, and original language.

    • Shape: (4803, 20)
    • Columns: 'movie_id', 'title', 'overview', 'genres', 'keywords', 'cast', 'crew', and more.

Data Preprocessing

I performed several preprocessing steps on the data to prepare it for building the recommendation system. Here are the key preprocessing steps:

  1. Handling Missing Values: I removed rows with missing values in the 'overview' column.

  2. Data Cleaning: I cleaned the text data in the 'overview' column by removing punctuation and converting text to lowercase.

  3. Feature Engineering: I extracted relevant features from the data, such as genres, keywords, cast, and crew, and transformed them into tags.

  4. Tag Generation: Tags were generated by combining information from different columns, such as the movie overview, genres, cast, crew, and keywords.

  5. Tag Normalization: All tags were converted to lowercase for consistency.