/film_script_analysis

Film Script Analyzer

Primary LanguageJupyter Notebook

Film Script Analysis

The files consist of:

  • Project Proposal: Project proposal that contains a detailed step by step explanation of the process (old file).

  • Presentation Presentation of the project, proposed steps (old file).

  • 00_webscrapping: Scrapping pages and parsing html to text.

  • 01_imdb: Querying omdb API to retrive film meta data.

  • 02_nlp: Use of NLTK library for tokenization, lemmation and text statistics.

  • 03_watson: Watson API interface to perform personality insights on the scripts.

  • 04_cleaning: Remove NA's, corrupted data, clean and order dataset.

  • 05_preprocessing: Scale data to have mean=0 and std = 1, used for learning models.

  • 06_visualization: Visualize relationships among the features of the datasets and clusterings.

  • 07_classification: Perform classification of script Genre using predictor features.

  • 08_regression: Peform regression of script imdbRating using predictor features.

  • 09_recomendation: Recommend scripts based on their content and their rating.

  • 10_summarization: Return script summary based on sentence words' frequencies.

  • 11_word2vec: Create word embedings, word vectors using the whole script corpora, check word relationships.

  • data: Dataset in csv, Recommendation dictionary and word2vec files.

  • scripts: All the scrapped scripts.