PDF-Chat-Ollama: A Python repository from Chaganti-Reddy

Setup

Step 1: Install the Requirements

pip install -r requirements.txt

Step 2: Pull the models (if you already have models loaded in Ollama, then not required)

Make sure to have Ollama running on your system from https://ollama.ai

ollama pull mistral

Step 3: put your files in the source_documents folder after making a directory

mkdir source_documents

Step 4: Ingest the files (use python3 if on mac)

python ingest.py

Output should look like this:

Creating new vectorstore
Loading documents from source_documents
Loading new documents: 100%|██████████████████████| 1/1 [00:01<00:00,  1.99s/it]
Loaded 235 new documents from source_documents
Split into 1268 chunks of text (max. 500 tokens each)
Creating embeddings. May take some minutes...
Ingestion complete! You can now run pdf-Ollama.py to query your documents

Step 5: Run this command (use python3 if on mac)

python pdf-Ollama.py

Play with your docs

Enter a query: How many locations does WeWork have?

Try with a different model:

ollama pull llama2:13b
MODEL=llama2:13b python privateGPT.py

Add more files

Put any and all your files into the source_documents directory

The supported extensions are:

.csv: CSV,
.docx: Word Document,
.doc: Word Document,
.enex: EverNote,
.eml: Email,
.epub: EPub,
.html: HTML File,
.md: Markdown,
.msg: Outlook Message,
.odt: Open Document Text,
.pdf: Portable Document Format (PDF),
.pptx : PowerPoint Document,
.ppt : PowerPoint Document,
.txt: Text file (UTF-8),

Credits

This code is generated from the Pdf-Chat project by PromptEngineer48

Chaganti-Reddy/PDF-Chat-Ollama