# Supreme Court Language Analysis

This project analyzes the **language used in Supreme Court oral arguments and opinions** to study how it has changed over time. The main focus is on measuring the complexity of language using:

- **Sentence length**
- **Word length**
- **Vocabulary sophistication**

The project uses different NLP tools to clean and preprocess the text data, removing irrelevant information, and then computes metrics like the number of sentences and word length.

## Installation and Setup

1. Clone the repository:

    ```bash
    git clone https://github.com/Cyebukayire/supreme_court_language_complexity.git
    cd supreme_court_language_complexity/src
    ```

2. Create a Conda environment and install dependencies:

    ```bash
    conda create --name sc_complexity python=3.12
    conda activate sc_complexity
    pip install -r requirements.txt
    ```

3. Run the text processing pipeline:

    ```bash
    python src/data_processing.py
    ```

This will generate a cleaned dataset with added sentence counts and cleaned text.