/NLP-Insights-into-Comedian-Transcripts

This project was done for the UCS672 course, "Data Science Applications: NLP, Computer Vision, and IoT." This project leverages Natural Language Processing (NLP) techniques to analyze comedian transcripts, extracting key themes, sentiment, and linguistic patterns to understand what makes certain jokes resonate with audiences.

Primary LanguageJupyter Notebook

NLP Insights into Comedian Transcripts

Overview

This project was developed for the UCS672 course, "Data Science Applications: NLP, Computer Vision, and IoT." It leverages Natural Language Processing (NLP) techniques to analyze transcripts from various stand-up comedians, extracting key themes, sentiment, and linguistic patterns to understand what makes certain jokes resonate with audiences. By processing and visualizing this data, the project aims to uncover trends and insights within the realm of stand-up comedy.

Transcripts for this project were sourced from Scraps from the Loft.

This project involves analyzing these transcripts to identify similarities and differences in comedians' styles and explore ways to generate new text based on the analyzed data.

Notebooks

  1. NLP in Python 1.ipynb: This notebook covers cleaning and preprocessing the data to prepare it for further analysis.

  2. NLP in Python 2.ipynb: This notebook performs exploratory data analysis to find patterns and insights in the data.

  3. NLP in Python 3.ipynb: This notebook generates word clouds and other visualizations to better understand the content of the transcripts.

  4. NLP in Python 4 (Sentiment Analysis).ipynb: This notebook uses the TextBlob module to perform sentiment analysis on the transcripts.

  5. NLP in Python 5.ipynb: This notebook uses the Latent Dirichlet Allocation (LDA) algorithm to identify topics present in the transcripts.

  6. NLP in Python 6.ipynb: This notebook uses Markov chains to generate new text based on the analyzed data.

How to Run

Prerequisites

Ensure you have the following installed:

  • Python 3.x
  • Jupyter Notebook
  • Required Python libraries: pandas, numpy, matplotlib, seaborn, nltk, wordcloud, sklearn, gensim, textblob, markovify

Steps

  1. Clone the Repository:

    git clone https://github.com/yourusername/NLP-Insights-into-Comedian-Transcripts.git
  2. Navigate to the Project Directory:

    cd NLP-Insights-into-Comedian-Transcripts
  3. Install the Required Libraries:

    pip install -r requirements.txt
  4. Run Jupyter Notebook:

    jupyter notebook
  5. Open and Run Notebooks:

    • Start with NLP in Python 1.ipynb to clean and preprocess the data.
    • Proceed to NLP in Python 2.ipynb for exploratory data analysis.
    • Continue with NLP in Python 3.ipynb for word clouds and visualizations.
    • Follow up with NLP in Python 4 (Sentiment Analysis).ipynb to perform sentiment analysis.
    • Next, run NLP in Python 5.ipynb for topic modeling.
    • Finally, run NLP in Python 6.ipynb to generate new text based on the analyzed data.

Conclusion

This project provides a comprehensive analysis of comedian transcripts, revealing insights into comedic styles and content. Through various NLP techniques, we explore patterns, sentiments, and topics, ultimately generating new comedic text. This exploration offers a unique intersection of data science and the art of comedy.

Happy analyzing!

Feel free to reach out if you have any questions or need further assistance.