Oracle Demo

Welcome to our demo :)

1. Conda

This requires that you have installed and configured Miniconda in Windows. The second step that you have an environment-win.yaml file already setup with channels and decencies.

Step 1.1 Activate Environment

I recommend using Powershell for these commands. If conda works you will be able to see (base) in front of your path. And dont forgett to activate your environment.

Run start.bat ?
```
.\Start.bat
```
Run start.bat.
```
.\install_pnpm.ps1
```
Restart your shell
Activate your env
```
conda activate Oracle-Demo-1
```

2. Development

Clone the repo

git clone https://github.com/Chugarah/gpt4-pdf-chatbot-langchain.git
cd gpt4-pdf-chatbot-langchain
pnpm install
pnpm add sharp

Set up your .env file
- Copy .env.example into .env Your .env file should look like this:'
```
OPENAI_API_KEY=

PINECONE_API_KEY=
PINECONE_ENVIRONMENT=
PINECONE_INDEX_NAME=

ANSWER_LANGUAGE=
```
- Visit openai to retrieve API keys and insert into your .env file.
- Visit pinecone to create and retrieve your API keys, and also retrieve your environment and index name from the dashboard.

We need to update two things: the Pinecone index name and namespaces. Namespaces are the folders you have in your docs folder. Example:

# This is a namespace. ==> space-sci 
docs/space-sci

We need to edit two files: config/pinecone.ts.

In the config/pinecone.ts folder, change the PINECONE_INDEX_NAME with a Index Name you created in Pinecone. Example
```
export const PINECONE_INDEX_NAME = 'demo-data';
```

Now, we need to add or remove namespaces based on your docs folder. Remember that the namespaces need to exactly match the folder names. For example:

 export const TOPICS = [
 ## Name venus-atmosphere-life
 {
     TOPIC: 'Life in the Atmosphere of Venus',
     NAMESPACE: 'venus-atmosphere-life', // MUST ONLY CONTAIN LOWER CASE LETTERS A-Z AND HYPHENS
     PROMPT:
     'What evidence is there that life exists in the atmosphere of Venus?',
 },
 # supreme-court-cases
 {
     TOPIC: 'Supreme Court Cases',
     NAMESPACE: 'supreme-court-cases', // MUST ONLY CONTAIN LOWER CASE LETTERS A-Z AND HYPHENS
     PROMPT: 'What precedent was set by Morse v. Frederick?',
 },
 ];

In utils/makechain.ts chain change the QA_PROMPT for your own usecase. Change modelName in new OpenAIChat to gpt-3.5-turbo, if you don't have access to gpt-4. Please verify outside this repo that you have access to gpt-4, otherwise the application will not work with it.

Convert your PDF files to embeddings

This repo can load multiple PDF files :)

Inside docs folder, add your pdf files or folders that contain pdf files.
Run the script npm run ingest to 'ingest' and embed your docs. If you run into errors troubleshoot below.
Check Pinecone dashboard to verify your namespace and vectors have been added.

Run the app

Once you've verified that the embeddings and content have been successfully added to your Pinecone, you can run the app pnpm run dev to launch the local dev environment, and then type a question in the chat interface.

Docker

You can now run the app using Docker container. Start your favorite terminal and run these commands.

cd gpt4-pdf-chatbot-langchain/docker
# Build the image
docker compose build
# Run the container
docker compose up

Native Node

# To build app
pnpm run build
# To start app
pnpm run start
# For development
npm run dev

Native Node under Windows

This is different you can run the App. We have two options. The first one is to generate vector data to feed into Pinecone. The second one is to run the webserver and the chatbot.

Generate Vector Data

Start Powershell or your favorite terminal
Run Shell Command
```
conda activate Oracle-Demo-1
```
Navigate to your project folder
```
cd gpt4-pdf-chatbot-langchain
```
Run the Vector Generator. This is when you want to upload your document to Pinecone.
```
npm run ingest
```
To run the webserver and the chatbot
```
pnpm run build
```

Troubleshooting

In general, keep an eye out in the issues and discussions section of this repo for solutions. General errors

Make sure you're running the latest Node version. Run node -v
Try a different PDF or convert your PDF to text first. It's possible your PDF is corrupted, scanned, or requires OCR to convert to text.
Console.log the env variables and make sure they are exposed.
Make sure you're using the same versions of LangChain and Pinecone as this repo.
Check that you've created an .env file that contains your valid (and working) API keys, environment and index name.
If you change modelName in OpenAIChat note that the correct name of the alternative model is gpt-3.5-turbo
Make sure you have access to gpt-4 if you decide to use. Test your openAI keys outside the repo and make sure it works and that you have enough API credits.
Check that you don't have multiple OPENAPI keys in your global environment. If you do, the local env file from the project will be overwritten by systems env variable.
Try to hard code your API keys into the process.env variables. Pinecone errors
Make sure your pinecone dashboard environment and index matches the one in the pinecone.ts and .env files.
Check that you've set the vector dimensions to 1536.
Make sure your pinecone namespace is in lowercase.
Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter before 7 days.
Retry from scratch with a new Pinecone project, index, and cloned repo.

Credit

Frontend of this repo is inspired by langchain-chat-nextjs