All of the scripts and notebooks I used for Davis COurse Search including a webscraper for getting UC DAvis courses , creating embeddings and uploading them to a pinecone vector db
python 3.10
- OPEN AI Account
- Pinecone Index`
Clone repository
git clone https://github.com/puravparab/DavisScripts.git
cd DavisScripts
Get virtual environment running
pip install --user pipenv
pipenv shell
pipenv sync
Create .env file and setup environment variables
OPENAI=<Your OpenAI secret key>
PINECONE=<Add your Pinecone Index Key>
PINECONE_ENV=<Pinecone environment>
- Get all courses of specified subject and store it in data.json
python course_download.py <subject code> > data.json
- Get all subjects and store their codes in subject_codes.json
python get_subject_codes.py > subject_codes.json
- Get all courses of all subjects and store them in course_data.json
python get_all_courses.py
-
Get embeddings for course_data.json and prompt them
-
Converting embeddings into json that can be inserted into a pinecone index