Graph RAG

Introduction

Graph RAG (Retrieval Augmented Generation) is a technology that uses graph databases to store relationships between data points, replacing traditional text embedding methods like Chuck and enhancing the performance of RAG. This project is base on Langchain and Neo4j.

Showcase: Question is "how about the performance of llava" Answer is "The performance of LLaVA, as mentioned in the structured data, has an overall score of 66.7 + 0.3, with a complex reasoning score of 81.4 + 0.3, a performance score of 58.8 + 0.6, and an improvement score of 49.2 + 0.8." The reference is a knowledge graph shown below

Features

Create a RAG from PDFs
Connect to Zotero to retrival PDFs
Graph visulation through Neo4j Browser
All data are stored locally, supporting llama.cpp and Ollama local LLM

Installation

Step1, clone this project and install dependencies

git clone https://github.com/zjkhurry/Graph-RAG.git
cd Graph-RAG
pip install -r requitements.txt

Copy and midofy config.ini

cp config.ini.bak config.ini

Step2, you need to install Neo4j. Download Neo4j Desktop or Mac can install with homebrew

brew install --cask neo4j

Launch Neo4j Desktop, create a New Project and add a new Graph DBMS.
Enter the password into the config.ini - Neo4j - password. Then click on the Graph DBMS created in the last step and install APOC plugin.
Start the Graph DBMS.

Step3 (optional) Set up Ollama or llama.cpp to use local LLM.

For Ollama, download the app here, run Ollama and then

ollama pull interstellarninja/hermes-2-pro-llama-3-8b
ollama pull mxbai-embed-large

P.S. hermers-2-pro has a better sepport to function calling than original llama3.

For llama.cpp, follow the instruction here to build and run llama.cpp server.

Step4, config the zotero. Pyzotero is used to connect to Zotero library. You'll need the ID of the personal or group library you want to access:

Your personal library ID is available here, in the section Your userID for use in API calls (you may need to login). Enter you ID into config.ini
For group libraries, the ID can be found by opening the group's page: https://www.zotero.org/groups/groupname, and hovering over the group settings link. The ID is the integer after /groups/
You'll also need to get an API key here, enter you API key into config.ini
Are you accessing your own Zotero library? library_type is 'user'
Are you accessing a shared group library? library_type is 'group'.

P.S. I can't make Zotero.file() work properly, maybe because I use WebDAV instead of zotero to store the pdf files, so Zotero_dir is needed to find the PDFs in the file system.

Step5, modify the config.ini, choose to use ollama or openai (llama.cpp), LLM model, embedding model and so on. You can choose different models for convert PDF, embedding and chat.

Usage

To convert PDF files into graph, just use the dpf2graph.py

python pdf2graph.py

Enter the path to the PDF file or just enter the title of the paper stored in Zotero library. Each PDF may take several minutes.

After convert, you can run graphQA.py to query this graph, enter exit to stop.

python graphQA.py

References

openai cookbook Langchain Neo4j Pyzotero blogs