su77ungr/CASALIOY

Progress stuck in window 10 for "python casalioy/ingest.py source_documents/"

madeepakkumar1 opened this issue · 3 comments

.env

Generic

TEXT_EMBEDDINGS_MODEL=sentence-transformers/all-MiniLM-L6-v2
TEXT_EMBEDDINGS_MODEL_TYPE=HF # LlamaCpp or HF
USE_MLOCK=false

Ingestion

PERSIST_DIRECTORY=db
DOCUMENTS_DIRECTORY=source_documents
INGEST_CHUNK_SIZE=500
INGEST_CHUNK_OVERLAP=50
INGEST_N_THREADS=3

Generation

MODEL_TYPE=LlamaCpp # GPT4All or LlamaCpp
MODEL_PATH=eachadea/ggml-vicuna-7b-1.1/ggml-vic7b-q5_1.bin
MODEL_TEMP=0.8
MODEL_N_CTX=1024 # Max total size of prompt+answer
MODEL_MAX_TOKENS=256 # Max size of answer
MODEL_STOP=[STOP]
CHAIN_TYPE=betterstuff
N_RETRIEVE_DOCUMENTS=100 # How many documents to retrieve from the db
N_FORWARD_DOCUMENTS=100 # How many documents to forward to the LLM, chosen among those retrieved
N_GPU_LAYERS=4

Python version

python3.10.11

System

Windows 10

CASALIOY version

main

Information

  • The official example scripts
  • My own modified scripts

Related Components

  • Document ingestion
  • GUI
  • Prompt answering

Reproduction

$python casalioy/ingest.py source_documents/

Expected behavior

It should create vector embedding db

That's a new one. Did you try rerunning it. Maybe try it again with a cleaned source_documents directory and a test.pdf of something.

Thanks @su77ungr , It worked post deleting everything and just putting one .pdf file, My guess is .docx creating problem in windows