😭 GraphRAG is good and powerful, but the official implementation is difficult/painful to read or hack.
😊 This project provides a smaller, faster, cleaner GraphRAG, while remaining the core functionality(see benchmark and issues ).
🎁 Excluding tests
and prompts, nano-graphrag
is about 1100 lines of code.
👌 Small yet portable(faiss, neo4j, ollama...), asynchronous and fully typed.
Install from source (recommend)
# clone this repo first
cd nano-graphrag
pip install -e .
Install from PyPi
pip install nano-graphrag
Tip
Please set OpenAI API key in environment: export OPENAI_API_KEY="sk-..."
.
Tip
If you're using Azure OpenAI API, refer to the .env.example to set your azure openai. Then pass GraphRAG(...,using_azure_openai=True,...)
to enable.
Tip
If you don't have any key, check out this example that using transformers
and ollama
. If you like to use another LLM or Embedding Model, check Advances.
download a copy of A Christmas Carol by Charles Dickens:
curl https://raw.githubusercontent.com/gusye1234/nano-graphrag/main/tests/mock_data.txt > ./book.txt
Use the below python snippet:
from nano_graphrag import GraphRAG, QueryParam
graph_func = GraphRAG(working_dir="./dickens")
with open("./book.txt") as f:
graph_func.insert(f.read())
# Perform global graphrag search
print(graph_func.query("What are the top themes in this story?"))
# Perform local graphrag search (I think is better and more scalable one)
print(graph_func.query("What are the top themes in this story?", param=QueryParam(mode="local")))
Next time you initialize a GraphRAG
from the same working_dir
, it will reload all the contexts automatically.
graph_func.insert(["TEXT1", "TEXT2",...])
Incremental Insert
nano-graphrag
supports incremental insert, no duplicated computation or data will be added:
with open("./book.txt") as f:
book = f.read()
half_len = len(book) // 2
graph_func.insert(book[:half_len])
graph_func.insert(book[half_len:])
nano-graphrag
use md5-hash of the content as the key, so there is no duplicated chunk.However, each time you insert, the communities of graph will be re-computed and the community reports will be re-generated
Naive RAG
nano-graphrag
supports naive RAG insert and query as well:
graph_func = GraphRAG(working_dir="./dickens", enable_naive_rag=True)
...
# Query
print(rag.query(
"What are the top themes in this story?",
param=QueryParam(mode="naive")
)
For each method NAME(...)
, there is a corresponding async method aNAME(...)
await graph_func.ainsert(...)
await graph_func.aquery(...)
...
GraphRAG
and QueryParam
are dataclass
in Python. Use help(GraphRAG)
and help(QueryParam)
to see all available parameters!
Below are the components you can use:
Type | What | Where |
---|---|---|
LLM | OpenAI | Built-in |
DeepSeek | examples | |
ollama |
examples | |
Embedding | OpenAI | Built-in |
Sentence-transformers | examples | |
Vector DataBase | nano-vectordb |
Built-in |
hnswlib |
Built-in, examples | |
milvus-lite |
examples | |
faiss | examples | |
Graph Storage | networkx |
Built-in |
neo4j |
Built-in(doc) | |
Visualization | graphml | examples |
Chunking | by token size | Built-in |
by text splitter | Built-in |
-
Built-in
means we have that implementation insidenano-graphrag
.examples
means we have that implementation inside an tutorial under examples folder. -
Check examples/benchmarks to see few comparisons between components.
-
Always welcome to contribute more components.
Only query the related context
graph_func.query
return the final answer without streaming.
If you like to interagte nano-graphrag
in your project, you can use param=QueryParam(..., only_need_context=True,...)
, which will only return the retrieved context from graph, something like:
# Local mode
-----Reports-----
```csv
id, content
0, # FOX News and Key Figures in Media and Politics...
1, ...
```
...
# Global mode
----Analyst 3----
Importance Score: 100
Donald J. Trump: Frequently discussed in relation to his political activities...
...
You can integrate that context into your customized prompt.
Prompt
nano-graphrag
use prompts from nano_graphrag.prompt.PROMPTS
dict object. You can play with it and replace any prompt inside.
Some important prompts:
PROMPTS["entity_extraction"]
is used to extract the entities and relations from a text chunk.PROMPTS["community_report"]
is used to organize and summary the graph cluster's description.PROMPTS["local_rag_response"]
is the system prompt template of the local search generation.PROMPTS["global_reduce_rag_response"]
is the system prompt template of the global search generation.PROMPTS["fail_response"]
is the fallback response when nothing is related to the user query.
Customize Chunking
nano-graphrag
allow you to customize your own chunking method, check out the example.
Switch to the built-in text splitter chunking method:
from nano_graphrag._op import chunking_by_seperators
GraphRAG(...,chunk_func=chunking_by_seperators,...)
LLM Function
In nano-graphrag
, we requires two types of LLM, a great one and a cheap one. The former is used to plan and respond, the latter is used to summary. By default, the great one is gpt-4o
and the cheap one is gpt-4o-mini
You can implement your own LLM function (refer to _llm.gpt_4o_complete
):
async def my_llm_complete(
prompt, system_prompt=None, history_messages=[], **kwargs
) -> str:
# pop cache KV database if any
hashing_kv: BaseKVStorage = kwargs.pop("hashing_kv", None)
# the rest kwargs are for calling LLM, for example, `max_tokens=xxx`
...
# YOUR LLM calling
response = await call_your_LLM(messages, **kwargs)
return response
Replace the default one with:
# Adjust the max token size or the max async requests if needed
GraphRAG(best_model_func=my_llm_complete, best_model_max_token_size=..., best_model_max_async=...)
GraphRAG(cheap_model_func=my_llm_complete, cheap_model_max_token_size=..., cheap_model_max_async=...)
You can refer to this example that use deepseek-chat
as the LLM model
You can refer to this example that use ollama
as the LLM model
nano-graphrag
will use best_model_func
to output JSON with params "response_format": {"type": "json_object"}
. However there are some open-source model maybe produce unstable JSON.
nano-graphrag
introduces a post-process interface for you to convert the response to JSON. This func's signature is below:
def YOUR_STRING_TO_JSON_FUNC(response: str) -> dict:
"Convert the string response to JSON"
...
And pass your own func by GraphRAG(...convert_response_to_json_func=YOUR_STRING_TO_JSON_FUNC,...)
.
For example, you can refer to json_repair to repair the JSON string returned by LLM.
Embedding Function
You can replace the default embedding functions with any _utils.EmbedddingFunc
instance.
For example, the default one is using OpenAI embedding API:
@wrap_embedding_func_with_attrs(embedding_dim=1536, max_token_size=8192)
async def openai_embedding(texts: list[str]) -> np.ndarray:
openai_async_client = AsyncOpenAI()
response = await openai_async_client.embeddings.create(
model="text-embedding-3-small", input=texts, encoding_format="float"
)
return np.array([dp.embedding for dp in response.data])
Replace default embedding function with:
GraphRAG(embedding_func=your_embed_func, embedding_batch_num=..., embedding_func_max_async=...)
You can refer to an example that use sentence-transformer
to locally compute embeddings.
Storage Component
You can replace all storage-related components to your own implementation, nano-graphrag
mainly uses three kinds of storage:
base.BaseKVStorage
for storing key-json pairs of data
- By default we use disk file storage as the backend.
GraphRAG(.., key_string_value_json_storage_cls=YOURS,...)
base.BaseVectorStorage
for indexing embeddings
- By default we use
nano-vectordb
as the backend. - We have a built-in
hnswlib
storage also, check out this example. - Check out this example that implements
milvus-lite
as the backend (not available in Windows). GraphRAG(.., vector_db_storage_cls=YOURS,...)
base.BaseGraphStorage
for storing knowledge graph
- By default we use
networkx
as the backend. GraphRAG(.., graph_storage_cls=YOURS,...)
You can refer to nano_graphrag.base
to see detailed interfaces for each components.
Check FQA.
See ROADMAP.md
nano-graphrag
is open to any kind of contribution. Read this before you contribute.
nano-graphrag
didn't implement thecovariates
feature ofGraphRAG
nano-graphrag
implements the global search different from the original. The original use a map-reduce-like style to fill all the communities into context, whilenano-graphrag
only use the top-K important and central communites (useQueryParam.global_max_consider_community
to control, default to 512 communities).