A web application that helps you interact with Coursera video transcripts using AI. Features include summarization, question-answering, question generation, and exam preparation.
- Upload Coursera Transcripts: Upload PDF transcripts from Coursera courses.
- Generate Summaries: Get concise summaries of course content by week.
- Ask Questions: Chat with an AI that answers questions based on course content.
- Generate Quiz Questions: Create customized quiz questions from course material.
- Exam Preparation: Generate practice exams tailored to course content.
- Frontend: Streamlit
- Backend: Python
- AI/ML: OpenAI API, LlamaIndex for RAG (Retrieval-Augmented Generation)
- Storage:
- Supabase (metadata & file storage)
- Pinecone (vector database for embeddings)
-
Clone this repository:
git clone <repository-url> cd coursera-study-buddy
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
Create a
.env
file in the project root with the following:OPENAI_API_KEY=your-openai-api-key SUPABASE_URL=your-supabase-url SUPABASE_KEY=your-supabase-key PINECONE_API_KEY=your-pinecone-api-key PINECONE_ENVIRONMENT=gcp-starter
- Create a new project on Supabase
- Create a storage bucket called
transcripts
- Create a table called
transcripts
with the following schema:
CREATE TABLE transcripts (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
course_name TEXT NOT NULL,
week_number INTEGER,
transcript_name TEXT NOT NULL,
file_path TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
- Create a new index named
coursera-transcripts
- Use
cosine
as the similarity metric - Use dimension of 1536 (for OpenAI embeddings)
Start the Streamlit app:
streamlit run app.py
The application will be available at http://localhost:8501
- Upload Tab: Upload Coursera PDF transcripts with course name, week, and lecture name.
- Summarize Tab: Generate summaries of uploaded transcripts by course and week.
- Ask Questions Tab: Ask specific questions about course content.
- Generate Quiz Tab: Create custom quiz questions based on course material.
- Exam Prep Tab: Generate practice exams simulating Coursera quizzes.
- Push your code to GitHub
- Sign up for Streamlit Cloud
- Create a new app pointing to your GitHub repository
- Add your environment variables in the Streamlit Cloud dashboard
- Free tier limits:
- Supabase: 500MB storage (sufficient for ~50 PDF transcripts)
- Pinecone: Limited to one index in the free tier
- OpenAI API: Requires a paid API key (~$1-2/month for typical usage)
If you're experiencing issues with file uploads to Supabase:
-
Check Supabase credentials:
- Verify your SUPABASE_URL and SUPABASE_KEY are correct
- Ensure you're using the "anon" key (public) for the SUPABASE_KEY
-
Bucket permissions:
- In the Supabase dashboard, go to Storage → Buckets → transcripts
- Check that bucket policies allow uploads (RLS policies)
- You may need to temporarily set the bucket to public during testing
-
File path issues:
- If complex paths fail, try uploading directly to the root of the bucket
- Use the "upsert" option for replacing existing files
-
Check environment setup:
- For local development: Make sure
.env
file exists with correct credentials - For Streamlit Cloud: Ensure secrets are properly configured
- For local development: Make sure
-
Debugging:
- Add debug print statements to view Supabase responses
- Check the app logs for specific error messages
If you experience other problems:
-
Enable debug mode:
streamlit run app.py --logger.level=debug
-
Check API rate limits:
- OpenAI and Pinecone have rate limits on free/starter tiers
- Space requests out if hitting limits
-
Large PDF files:
- PDF files over 10MB may cause memory issues
- Consider splitting large transcripts
Contributions are welcome! Please feel free to submit a Pull Request. # Coursera Study Buddy