/nyt_comment

Primary LanguageJupyter Notebook

Predicting Section Names of Articles from the New York Times

Team NYT PYT

Nina Hua, Paul Kim, Jacques Sham, Philip Trinh

The goal of this project is to predict the section names of New York Times articles given the comments, news desk, and type of material. The data comes from Kaggle (https://www.kaggle.com/aashita/nyt-comments).

We used Logistic Regression, Multinomial Naive Bayes, SVM, Perceptron, and attempted LDA and GMM (the latter two did not work). We also performed boosting with the working models.

In this repository:

  1. Data/ - Datasets used.*
  2. Models/ - Directory holding pickle files saved for each model.*
  3. Notebook_Saves/ - Saved environment dumps of working notebooks for easy retrieval of variables.*
  4. Exploration_Notebooks/ - Directory consisting of all exploration notebooks of each model.
  5. final_project_checkin-Team_NYT_PYT.ipynb - Mid-Module checkin of project.
  6. NYT_Presentation.ipynb - Final Presentation of project.

Note - Data, Models, and Notebook Saves can be downloaded from here and should be extracted into respective directories mentioned above.

* - Not stored on GitHub due to large size