/the-seinfeld-chronicles

A dataset for textual analysis on arguably the best written comedy television show ever.

Primary LanguageJupyter Notebook

A dataset for textual analysis on arguably the best written comedy television show ever.


Context

Dataset for people who love data science and Seinfeld.


Content

  • Details about all the episodes.
  • Includes attributes like Director, Episode Name, Air Date etc...
  • Complete Scripts of all the episodes.

Upcoming Update will Include :

  • Stage locations and cast

Data Source

The data is scraped from the fan website http://www.seinology.com/.


Possible Explorations

  • Train language models on the corpus.
  • Compare the vocabulary with other works on television, film or literature.
  • Find corellation between language complexity and popularity.
  • Train models to generate scripts based on the data.
  • Analyze obscure wods used in the vocabulary of the series.

These are just basic examples, sky is the limit.


Acknowledgements

The data has been crawled from the http://www.seinology.com/ website.


Contributing

Changes and Improvement suggestions are welcome. Feel free to comment new additions that you think are useful or drop a PR on the github project.

Wanna buy me coffee - paypal.me/AShrivastava961