/KeynessToolsTalk

Primary LanguageHTMLApache License 2.0Apache-2.0

KeynessToolsTalk

Talk: Scatterchron: visualizing diachronic or multi-class corpora in whole and parts

The slides are available here.

Code for the Feb. 26, 2023 talk "Scatterchron: visualizing diachronic or multi-class corpora in whole and parts” is available at BBC One Year News.ipynb.

An interactive version is available at on nbviewer.

Ensure you are using Scattertext version >= 0.2.0, and Python 3.11 or higher.

Tutorial: It was obvious in retrospect: interactive language visualization with Scattertext

Installation

Caveats

  • The documentation is vignette-based. Many features are undocumented. The code is still in beta. Breaking changes can be made at any time!
  • With this in mind, don't be afraid to look through the code, make changes, and get your hands dirty.
  • Test case coverage could be a lot higher. Breaking changes may have been made that didn't trigger test case failures.
  • The visualization framework is written in Javascript and D3 v4. Browsers do not consistently implement the same Javascript standard, and their implementations can shift version-to-version, etc. In other words, you may have to modify the Javascript code to fix your visualization.

Tutorial Agenda

  • Introducing Scattertext

Part 1. Visualizing two categories (this tutorial)

Part 1 of the tutorial is available at Keyness Workshop Tutorial Part 1 - It's good to be flawed.ipynb

An interactive version is available at on nbviewer.

  • The Rotten Tomatoes Corpus
  • Creating text-based corpora
  • Counting terms
  • Visualizing term counts
  • How the visualization works
  • Customizing the visualization; text colors
  • Scoring terms
  • Visualizing term scores
  • Using scattertext to train Gensim word embeddings
  • Visualizing projections of word embeddings
  • Visualizing how similar words are used across-categories
  • Dispersion metrics
  • Residual Dispersion
  • Du's Eta for term scoring

Part 2. Visualizing lexica and topics

Part 2 of the tutorial is available at Keyness Tutorial Part 2 - Integrating External Lexicons, Feature Sets and Topics

An interactive version is available at on nbviewer.

  • Visualizing Empath lexicons
  • Making use of a topic models output
  • Making use of the Biber Feature Set via MTFE (Le Foll et al 2023)
  • Making use of the USAS Feature Set
  • Making use of Roget's thesaurus

Part 3. Visualizing multiple categories and change over time

Part 3 of the tutorial is available at Keyness Workshop Tutorial Part 3 - Reading Doyle over Time and Pages.ipynb

An interactive version is available at on nbviewer.

  • Segmenting long documents into evenly sized chunks while respecting sentence boundaries (SentenceSequenceSegmenter)
  • Offset-based feature identification for non-textual features, such as part-of-speech tag sequences
  • Timeline based visualizations
  • One time-step per page in a novel
  • Clustering time-steps together
  • Looking at the evolution of Doyle's style through part-of-speech tag sequences