YouTube Video Summarizer with GPT-4 and YouTube Transcriber APIs

This project is designed to extract transcriptions from YouTube videos using the YouTube Transcriber API and then generate concise summaries of these transcriptions using the GPT-4 API. The application includes both a terminal-based script and a Django-based web API for generating chapter titles from video transcriptions. Additionally, it involves setting up a data pipeline to synchronize data between PostgreSQL and AWS DynamoDB, and deploying the application using Docker on AWS Elastic Beanstalk.

Technologies Used
Setup and Installation
Terminal Application
API Version
Containerizing with Docker
Data Pipeline
Deploying to AWS Elastic Beanstalk
Conclusion

Technologies Used

Python: Core programming language.
Django: Web framework for the API version.
YouTube Transcript API: For extracting video transcriptions.
OpenAI GPT-4: For generating summaries from transcriptions.
PostgreSQL: Primary database for storing video transcriptions and summaries.
AWS DynamoDB: NoSQL database for storing processed data.
Docker: For containerizing the application.
AWS Elastic Beanstalk: For deploying the Docker container.
Docker Compose: For managing multi-container applications.
AWS Amplify: For real-time messaging, authentication, notifications, and deployment.

Setup and Installation

Clone the Repository:

git clone https://github.com/your-repo/youtube-video-summarizer.git
cd youtube-video-summarizer

Create and Activate a Virtual Environment:

python3 -m venv venv
source venv/bin/activate

Install Dependencies:
```
pip install -r requirements.txt
```

Set Up Environment Variables: Create a .env file and add your OpenAI API key, AWS credentials, and other necessary environment variables.

OPENAI_API_KEY=your_openai_api_key
AWS_ACCESS_KEY_ID=your_aws_access_key_id
AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key
AWS_DEFAULT_REGION=your_aws_default_region

Terminal Application

The terminal application extracts transcriptions from a YouTube video and generates chapter titles using the GPT-4 API.

Script Explanation:
- main.py: Contains functions to fetch video transcriptions, group sentences, split into chapters by topic, and generate chapter titles using GPT-4.
Run the Script:
```
python main.py <youtube_video_id>
```

API Version

Transitioning from a terminal application to a web API using Django.

API Endpoint:
- /generate-titles/<video_id>/: Generates and returns chapter titles for the given YouTube video ID.
Run the Django Server:
```
python manage.py runserver
```

Containerizing with Docker

Containerizing the application to ensure consistent environments across development, testing, and production.

Dockerfile:

FROM python:3.11.4-slim-buster
WORKDIR /app
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
COPY . .

Docker Compose:

version: '3.8'

services:
  web:
    build: .
    command: ["sh", "-c", "python manage.py migrate && python manage.py runserver 0.0.0.0:8000"]
    volumes:
      - .:/app
    ports:
      - "8000:8000"
    env_file:
      - .env
    depends_on:
      - db

  db:
    image: postgres:15
    volumes:
      - postgres_data:/var/lib/postgresql/data/
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=secret
      - POSTGRES_DB=transcribed_data

  etl:
    build: .
    command: ["sh", "-c", "sleep 10 && python etl_script.py"]
    volumes:
      - .:/app
    env_file:
      - .env
    depends_on:
      - web

volumes:
  postgres_data:

Build and Run Containers:
```
docker-compose up --build
```

Data Pipeline

Setting up an ETL pipeline to synchronize data between PostgreSQL and AWS DynamoDB.

ETL Script Explanation:
- etl_script.py: Connects to PostgreSQL, fetches data, and inserts it into DynamoDB.

Deploying to AWS Elastic Beanstalk

Deploying the Dockerized application to AWS Elastic Beanstalk for scalability and ease of management.

Initialize Elastic Beanstalk:
```
eb init -p docker time-stamp
```
Create and Deploy Environment:
```
eb create time-stamp-env
eb deploy
```

Conclusion

Key Achievements:

Developed a robust application to summarize YouTube videos using advanced APIs.
Containerized the application for consistency and ease of deployment.
Established a reliable data pipeline between PostgreSQL and AWS DynamoDB.
Successfully deployed the application on AWS Elastic Beanstalk.

Challenges and Solutions:

Ensured compatibility between different APIs and services.
Overcame data consistency issues by implementing a robust ETL pipeline.

Future Improvements:

Enhance the summarization algorithm for better accuracy.
Implement user authentication and access control for the API.
Integrate additional data sources and processing features.

Call to Action

Try out similar projects and explore the use of APIs and cloud services in your applications. For more details, visit the project repository and check out the documentation.

kevin-ada/You-Tube-Summarizer-API