NARST2024_Taking_a_look_under_the_hoodII-beyond_automation

Natural Language Toolkit NLP-Workshop

Overview

The data corpus that we will be using is a CSV file of Whisper AI transcription of a high school math class. This data is located in Google Drive, we will use NLTK, Python, and Google Colab to copy and process the file so that it can be analyzed by you during the workshop. From there we will do some basic processing and analysis to extract specific features that give us information about student discussions during this math class.

Section 1 ( minutes)

  1. What do we know?
  2. What do we want to know?
  3. Some NLP Basics
  4. What is feature extraction
  5. Using Google Colab

Section 2 ( minutes)

  1. Installing dependencies and libraries
  2. Connecting to Google Drive
  3. Importing and initial processing of Uncertainty Transcript
  4. Some quick analysis

Section 3 ( minutes)

  1. Word counts and sorting
  2. Concordance
  3. N-grams and collocations
  4. Visualizations

Conclusion ( minutes)

  1. Issues to keep in mind when normalizing your data corpus
  2. Potential pitfalls and ethical considerations
  3. What did we learn?