/nytcrossword

An exploration of New York Times crossword answers from 1994-2017, i.e. the Will Shortz era.

Primary LanguageHTMLMIT LicenseMIT

24 Years of NYTimes Crossword answers

September 2, 2017

View the notebook here

Description

Exploratory data analysis of 24 years of New York Times Crossword answers. I use data visualization and computational linguistics concepts to discover trends in the Shortz-era puzzles (1994 - present).

Questions include:

  • What are the most common answers?
  • Are words getting longer? Shorter?
  • How does puzzle letter density vary by day?
  • What words have emerged in the crossword only in the past few years?
  • How lexically diverse are the puzzles?

Dependencies

  • tidyverse for everything
  • plyr for data wrangling
  • here for OS-agnostic file paths
  • tidytext for text analysis methods
  • stringr for string-manipulation operations
  • viridis for a simple, colorblind-friendly palette

Data Sources

The original dataset for this project was scraped from XWordInfo.com. Upon their request, however, I have taken down my scraper code and removed the dataset from this repository. Read the notebook for more details.