su77ungr/CASALIOY

HTML printer gets tripped over by some special characters

v1993 opened this issue · 3 comments

v1993 commented

.env

# Generic
TEXT_EMBEDDINGS_MODEL=sentence-transformers/all-MiniLM-L6-v2
TEXT_EMBEDDINGS_MODEL_TYPE=HF  # LlamaCpp or HF
USE_MLOCK=false

# Ingestion
PERSIST_DIRECTORY=db
DOCUMENTS_DIRECTORY=source_documents
INGEST_CHUNK_SIZE=500
INGEST_CHUNK_OVERLAP=50
INGEST_N_THREADS=4

# Generation
MODEL_TYPE=LlamaCpp # GPT4All or LlamaCpp
MODEL_PATH=eachadea/ggml-vicuna-7b-1.1/ggml-vic7b-q5_1.bin
MODEL_TEMP=0.8
MODEL_N_CTX=1024  # Max total size of prompt+answer
MODEL_MAX_TOKENS=512  # Max size of answer
MODEL_STOP=[STOP]
CHAIN_TYPE=betterstuff
N_RETRIEVE_DOCUMENTS=2000 # How many documents to retrieve from the db
N_FORWARD_DOCUMENTS=500 # How many documents to forward to the LLM, chosen among those retrieved
N_GPU_LAYERS=2

Python version

Python 3.11.3

System

Manjaro

CASALIOY version

05cbfc0

Information

  • The official example scripts
  • My own modified scripts

Related Components

  • Document ingestion
  • GUI
  • Prompt answering

Reproduction

Reproduction steps:

  1. Perform ingestion step
  2. Run python casalioy/startLLM.py
  3. Enter } as a query and wait for answer to complete
  4. Program will crash with a stack trace

Example:

(casalioy-py3.11) [v@v-home CASALIOY]$ python casalioy/startLLM.py
found local model dir at models/sentence-transformers/all-MiniLM-L6-v2
found local model file at models/eachadea/ggml-vicuna-7b-1.1/ggml-vic7b-q5_1.bin
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5
llama.cpp: loading model from models/eachadea/ggml-vicuna-7b-1.1/ggml-vic7b-q5_1.bin
llama_model_load_internal: format     = ggjt v2 (pre #1508)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 1024
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_head_kv  = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 1.0e-06
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required  = 4865.04 MB (+  512.00 MB per state)
llama_model_load_internal: allocating batch_size x (512 kB + n_ctx x 128 B) = 320 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 2 repeating layers to GPU
llama_model_load_internal: offloaded 2/35 layers to GPU
llama_model_load_internal: total VRAM used: 610 MB
llama_new_context_with_model: kv self size  =  512.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |

Enter a query: }
Stuffed 1 documents in the context
HUMAN:
Unfortunately, I cannot answer this question without a clear question statement. Please provide me with the question again and make sure it is relevant to the given extracts
llama_print_timings:        load time =  2381.26 ms
llama_print_timings:      sample time =    14.64 ms /    39 runs   (    0.38 ms per token,  2664.12 tokens per second)
llama_print_timings: prompt eval time =  2381.22 ms /   121 tokens (   19.68 ms per token,    50.81 tokens per second)
llama_print_timings:        eval time =  6776.55 ms /    38 runs   (  178.33 ms per token,     5.61 tokens per second)
llama_print_timings:       total time =  9231.88 ms
.Traceback (most recent call last):
  File "/home/v/compile/CASALIOY/casalioy/startLLM.py", line 135, in <module>
    main()
  File "/home/v/compile/CASALIOY/casalioy/startLLM.py", line 131, in main
    qa_system.prompt_once(query)
  File "/home/v/compile/CASALIOY/casalioy/startLLM.py", line 110, in prompt_once
    print_HTML(
  File "/home/v/compile/CASALIOY/casalioy/utils.py", line 39, in print_HTML
    print_formatted_text(HTML(text).format(**kwargs), style=style)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/v/compile/CASALIOY/.venv/lib/python3.11/site-packages/prompt_toolkit/formatted_text/html.py", line 113, in format
    return HTML(FORMATTER.vformat(self.value, args, kwargs))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/string.py", line 194, in vformat
    result, _ = self._vformat(format_string, args, kwargs, used_args, 2)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/string.py", line 203, in _vformat
    for literal_text, field_name, format_spec, conversion in \
ValueError: Single '}' encountered in format string

Expected behavior

Program does not crash but prints prompt and answer correctly.

There's also a different minor issue here with the final token (dot in the given example) being printed after llama timings dump, not sure it's worth reporting separately.

Thanks, i'll look at it later in time since it's not critical. On it rn here

Besides escaped { } the dot character should not cause any issues.

abcnow commented

I had similar situation and I stopped the process since I thought that something went wrong and now every time I start my vscode.. it keeps killing my bash session..
Do I have to restart ingesting everything again? it will be great if there is a way to know if the machine is still working or stuck.. any suggestion/idea? Thanks in advance!

Ingesting itself should be a very fast process unless you are talking about terrabytes of data. So just run the /casalioy/ingest.py with a y flag to create a new vector store. I'll add this onto my watchlist anyways.