film_script_analysis: A Jupyter Notebook repository from aredwing

Film Script Analysis

The files consist of:

Project Proposal: Project proposal that contains a detailed step by step explanation of the process (old file).
Presentation Presentation of the project, proposed steps (old file).
00_webscrapping: Scrapping pages and parsing html to text.
01_imdb: Querying omdb API to retrive film meta data.
02_nlp: Use of NLTK library for tokenization, lemmation and text statistics.
03_watson: Watson API interface to perform personality insights on the scripts.
04_cleaning: Remove NA's, corrupted data, clean and order dataset.
05_preprocessing: Scale data to have mean=0 and std = 1, used for learning models.
06_visualization: Visualize relationships among the features of the datasets and clusterings.
07_classification: Perform classification of script Genre using predictor features.
08_regression: Peform regression of script imdbRating using predictor features.
09_recomendation: Recommend scripts based on their content and their rating.
10_summarization: Return script summary based on sentence words' frequencies.
11_word2vec: Create word embedings, word vectors using the whole script corpora, check word relationships.
data: Dataset in csv, Recommendation dictionary and word2vec files.
scripts: All the scrapped scripts.