ClassGPT

ChatGPT for my lecture slides

Built with Streamlit, powered by LlamaIndex and LangChain.

Uses the latest ChatGPT API from OpenAI.

Inspired by AthensGPT

App Demo

demo.mp4

How this works

Parses pdf with pypdf
Index Construction with LlamaIndex's GPTSimpleVectorIndex
- the text-embedding-ada-002 model is used to create embeddings
- see vector store index page to learn more
- here's a sample index
indexes and files are stored on s3
Query the index
- uses the latest ChatGPT model gpt-3.5-turbo

Usage

Configuration and secrets

configure aws (quickstart)

    aws configure

create an s3 bucket with a unique name
Change the bucket name in the codebase (look for bucket_name = "classgpt" to whatever you created.
rename [.env.local.example] to .env and add your openai credentials

Locally

create python env

    conda create -n classgpt python=3.9
    conda activate classgpt

install dependencies

    pip install -r requirements.txt

run streamlit app

    cd app/
    streamlit run app/01_❓_Ask.py

Docker

Alternative, you can use Docker

    docker compose up

Then open up a new tab and navigate to http://localhost:8501/

TODO

FAQ

Tokens

Tokens can be thought of as pieces of words. Before the API processes the prompts, the input is broken down into tokens. These tokens are not cut up exactly where the words start or end - tokens can include trailing spaces and even sub-words. Here are some helpful rules of thumb for understanding tokens in terms of lengths:

1 token ~= 4 chars in English
1 token ~= ¾ words
100 tokens ~= 75 words
1-2 sentence ~= 30 tokens
1 paragraph ~= 100 tokens
1,500 words ~= 2048 tokens

Try the OpenAI Tokenizer tool

Source

Embeddings

An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.

For text-embedding-ada-002, cost is $0.0004 / 1k tokens or 3000 pages/dollar

Models

For gpt-3.5-turbo model (ChatGPTAPI) cost is $0.002 / 1K tokens

For text-davinci-003 model, cost is $0.02 / 1K tokens

Chat completion - OpenAI API

References

Streamlit

Deplyoment

LlamaIndex

Loading data

multimodal

llama_index/Multimodal.ipynb at main

ChatGPT

gpt_index/SimpleIndexDemo-ChatGPT.ipynb

benthecoder/ClassGPT