Cinnamon/kotaemon

[BUG] NanoGraphRAG TypeError: 'NoneType' object is not subscriptable

vap0rtranz opened this issue · 4 comments

Description

I'm running NanoGraphRAG via docker against local Ollama. I pulled the main-full docker image this morning ... it was tagged as 10 days old.

I run this image with USE_NANO_GRAPHRAG=true but saw an error, so I installed Nano via the Readme.

Simple reasoning Chats with File Collection backed by Ollama work. The Information Panel is populated with indexed docs.

Chat with NanoGraphRAG Collection errors out in the UI. I've pasted the log below. The error ends with:

TypeError: 'NoneType' object is not subscriptable

What else can I check for finding out the root cause of this?

Reproduction steps

1. Go to Chat->NanoGraphRAG
2. Click on a Search in Files->select a file
3. Switch to Chat window, and prompt for "Please summarize"
4. See error

Screenshots

![DESCRIPTION](LINK.png)

Logs

INFO:httpx:HTTP Request: POST http://localhost:11434/v1/embeddings "HTTP/1.1 200 OK"
GraphRAG embedding dim 768
INFO:nano-graphrag:Load KV full_docs with 0 data
INFO:nano-graphrag:Load KV text_chunks with 0 data
INFO:nano-graphrag:Load KV llm_response_cache with 8 data
INFO:nano-graphrag:Load KV community_reports with 0 data
INFO:nano-graphrag:Loaded graph from /app/ktem_app_data/user_data/files/nano_graphrag/580ad20d-9321-4e5c-9c93-707181e1976c/input/graph_chunk_entity_relation.graphml with 1 nodes, 0 edges
INFO:nano-vectordb:Load (1, 768) data
INFO:nano-vectordb:Init {'embedding_dim': 768, 'metric': 'cosine', 'storage_file': '/app/ktem_app_data/user_data/files/nano_graphrag/580ad20d-9321-4e5c-9c93-707181e1976c/input/vdb_entities.json'} 1 data
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/embeddings "HTTP/1.1 200 OK"
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 575, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
    return await iterator.__anext__()
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
    return next(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
    response = next(iterator)
  File "/app/libs/ktem/ktem/pages/chat/__init__.py", line 812, in chat_fn
    for response in pipeline.stream(chat_input, conversation_id, chat_history):
  File "/app/libs/ktem/ktem/reasoning/simple.py", line 741, in stream
    docs, infos = self.retrieve(message, history)
  File "/app/libs/ktem/ktem/reasoning/simple.py", line 517, in retrieve
    retriever_docs = retriever_node(text=query)
  File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
    raise e from None
  File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
    output = self.fl.exec(func, args, kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
    return run(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
    raise e from None
  File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
    _output = self.next_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
    return self.next_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
    return self.run(*args, **kwargs)
  File "/app/libs/ktem/ktem/index/file/graph/nano_pipelines.py", line 385, in run
    entities, relationships, reports, sources = asyncio.run(
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/app/libs/ktem/ktem/index/file/graph/nano_pipelines.py", line 158, in nano_graph_rag_build_local_query_context
    use_text_units = await _find_most_related_text_unit_from_entities(
  File "/usr/local/lib/python3.10/site-packages/nano_graphrag/_op.py", line 772, in _find_most_related_text_unit_from_entities
    all_text_units = truncate_list_by_token_size(
  File "/usr/local/lib/python3.10/site-packages/nano_graphrag/_utils.py", line 74, in truncate_list_by_token_size
    tokens += len(encode_string_by_tiktoken(key(data)))
  File "/usr/local/lib/python3.10/site-packages/nano_graphrag/_op.py", line 774, in <lambda>
    key=lambda x: x["data"]["content"],
TypeError: 'NoneType' object is not subscriptable

Browsers

Firefox

OS

Linux

Additional information

I've tried with 2 different PDF files.

Hmm, I see another error in the UI. Under Files->Upload Info windows, after I upload new Files for Nano, I see this error at the bottom of the window:

[GraphRAG] Creating index... This can take a long time.
[GraphRAG] Indexed 0 / 648 documents.
Error: EmptyNetworkError

The docker runtime terminal of the Kotaemon app does not have this kind of error. Here is what I see at the end of its output after uploading files for Nano:

Would you like me to extract more entities? 
 --------------------------------------------------
⠼ Processed 4 chunks, 12 entities(duplicated), 0 relations(duplicated)
INFO:nano-graphrag:Inserting 10 vectors to entities
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/embeddings "HTTP/1.1 200 OK"
INFO:nano-graphrag:[Community Report]...
INFO:nano-graphrag:Writing graph with 10 nodes, 0 edges

Hi, we have that kind of error too. There is a FAQ about it https://github.com/gusye1234/nano-graphrag/blob/main/docs/FAQ.md

Hi, we have that kind of error too. There is a FAQ about it https://github.com/gusye1234/nano-graphrag/blob/main/docs/FAQ.md

Interesting. I was using the default llama3.1:8b. I'll retest with qwen2.5:14b.

Hi, we have that kind of error too. There is a FAQ about it https://github.com/gusye1234/nano-graphrag/blob/main/docs/FAQ.md

OK, I re-tested with Qwen and the EmptyNetworkError went away.

A different error appears in the UI after indexing, but the file does look to be indexed. Below is the snippet.

What is your setup?

I'm curious if you have setup the NanoGraphCollection differently than me.

Indexing [1/1]: IPCC_AR6_SYR_LongerReport.pdf
 => Converting IPCC_AR6_SYR_LongerReport.pdf to text
 => Converted IPCC_AR6_SYR_LongerReport.pdf to text
 => [IPCC_AR6_SYR_LongerReport.pdf] Processed 136 chunks
 => Finished indexing IPCC_AR6_SYR_LongerReport.pdf
Error: name 'EmbeddingFunc' is not defined