NARST2024_Taking_a_look_under_the_hoodII-beyond_automation

Natural Language Toolkit NLP-Workshop

Overview

The data corpus that we will be using is a CSV file of Whisper AI transcription of a high school math class. This data is located in Google Drive, we will use NLTK, Python, and Google Colab to copy and process the file so that it can be analyzed by you during the workshop. From there we will do some basic processing and analysis to extract specific features that give us information about student discussions during this math class.

Section 1 ( minutes)

What do we know?
What do we want to know?
Some NLP Basics
What is feature extraction
Using Google Colab

Section 2 ( minutes)

Installing dependencies and libraries
Connecting to Google Drive
Importing and initial processing of Uncertainty Transcript
Some quick analysis

Section 3 ( minutes)

Word counts and sorting
Concordance
N-grams and collocations
Visualizations

Conclusion ( minutes)

Issues to keep in mind when normalizing your data corpus
Potential pitfalls and ethical considerations
What did we learn?