/GraphRAG-SDK

Facilitate the creation of graph-based Retrieval-Augmented Generation (RAG), seamless integration with OpenAI to enable advanced data querying and knowledge graph construction.

Primary LanguagePythonMIT LicenseMIT

Dockerhub pypi Discord Contributor Covenant

GraphRAG-SDK

Try Free

GraphRAG-SDK is a comprehensive solution for building Graph Retrieval-Augmented Generation (GraphRAG) applications, leveraging FalkorDB for optimal performance.

Features

  • Ontology Management: Manage ontologies either manually or automatically from unstructured data.
  • Knowledge Graph (KG): Construct and query knowledge graphs for efficient data retrieval.
  • LLMs Integration: Support for OpenAI and Google Gemini models.
  • Multi-Agent System: Multi-agent orchestrators using KG-based agents.

Get Started

Install

pip install graphrag_sdk

Prerequisites

Graph Database

GraphRAG-SDK relies on FalkorDB as its graph engine and works with OpenAI/Gemini.

Use FalkorDB Cloud to get credentials or start FalkorDB locally:

docker run -p 6379:6379 -p 3000:3000 -it --rm  -v ./data:/data falkordb/falkordb:latest

LLM Models

Currently, this SDK supports the following LLMs API:

  • OpenAI Recommended model:gpt-4o
  • google Recommended model:gemini-1.5-flash-001

Make sure that a .env file is present with all required credentials.

.env
OPENAI_API_KEY="OPENAI_API_KEY"
GOOGLE_API_KEY="GOOGLE_API_KEY"

Basic Usage

Open In Colab

The following example demonstrates the basic usage of this SDK to create a GraphRAG using URLs with auto-detected ontology.

from dotenv import load_dotenv

from graphrag_sdk.source import URL
from graphrag_sdk import KnowledgeGraph, Ontology
from graphrag_sdk.models.openai import OpenAiGenerativeModel
from graphrag_sdk.model_config import KnowledgeGraphModelConfig
load_dotenv()

# Import Data
urls = ["https://www.rottentomatoes.com/m/side_by_side_2012",
"https://www.rottentomatoes.com/m/matrix",
"https://www.rottentomatoes.com/m/matrix_revolutions",
"https://www.rottentomatoes.com/m/matrix_reloaded",
"https://www.rottentomatoes.com/m/speed_1994",
"https://www.rottentomatoes.com/m/john_wick_chapter_4"]

sources = [URL(url) for url in urls]

# Model
model = OpenAiGenerativeModel(model_name="gpt-4o")

# Ontology Auto-Detection
ontology = Ontology.from_sources(
    sources=sources,
    model=model,
)

# Knowledge Graph
kg = KnowledgeGraph(
    name="movies",
    model_config=KnowledgeGraphModelConfig.with_model(model),
    ontology=ontology,
)

# GraphRAG System and Questioning
kg.process_sources(sources)

chat = kg.chat_session()

print(chat.send_message("Who is the director of the movie The Matrix?"))
print(chat.send_message("How this director connected to Keanu Reeves?"))

Tools

Open In Colab

Import source data

The SDK supports the following file formats:

  • PDF
  • TEXT
  • JSONL
  • URL
  • HTML
  • CSV
import os
from graphrag_sdk.source import Source

src_files = "data_folder"
sources = []

# Create a Source object.
for file in os.listdir(src_files):
    sources.append(Source(os.path.join(src_files, file)))

Ontology

You can either auto-detect the ontology from your data or define it manually. Additionally, you can set Boundaries for ontology auto-detection.

Once the ontology is created, you can review, modify, and update it as needed before using it to build the Knowledge Graph (KG).

import random
from falkordb import FalkorDB
from graphrag_sdk import KnowledgeGraph, Ontology
from graphrag_sdk.models.openai import OpenAiGenerativeModel

# Define the percentage of files that will be used to auto-create the ontology.
percent = 0.1  # This represents 10%. You can adjust this value (e.g., 0.2 for 20%).

boundaries = """
    Extract only the most relevant information about UFC fighters, fights, and events.
    Avoid creating entities for details that can be expressed as attributes.
"""

# Define the model to be used for the ontology
model = OpenAiGenerativeModel(model_name="gpt-4o")

# Randomly select a percentage of files from sources.
sampled_sources = random.sample(sources, round(len(sources) * percent))

ontology = Ontology.from_sources(
    sources=sampled_sources,
    boundaries=boundaries,
    model=model,
)

# Save the ontology to the disk as a json file.
with open("ontology.json", "w", encoding="utf-8") as file:
    file.write(json.dumps(ontology.to_json(), indent=2))

After generating the initial ontology, you can review it and make any necessary modifications to better fit your data and requirements. This might include refining entity types or adjusting relationships.

Once you are satisfied with the ontology, you can proceed to use it for creating and managing your Knowledge Graph (KG).

Knowledge Graph

Now, you can use the SDK to create a Knowledge Graph (KG) from your sources and ontology.

# After approving the ontology, load it from disk.
ontology_file = "ontology.json"
with open(ontology_file, "r", encoding="utf-8") as file:
    ontology = Ontology.from_json(json.loads(file.read()))

kg = KnowledgeGraph(
    name="kg_name",
    model_config=KnowledgeGraphModelConfig.with_model(model),
    ontology=ontology,
)

kg.process_sources(sources)

You can update the KG at any time by processing more sources with the process_sources method.

Graph RAG

At this point, you have a Knowledge Graph that can be queried using this SDK. You can use the ask method for single questions or chat_session for conversations.

# Single question.
response = kg.ask("What were the last five fights? When were they? How many rounds did they have?")
print(response)

# Conversation.
chat = kg.chat_session()
response = chat.send_message("Who is Salsa Boy?")
print(response)
response = chat.send_message("Tell me about one of his fights?")
print(response)

Multi Agent - Orchestrator

Open In Colab

The GraphRAG-SDK supports KG agents. Each agent is an expert in the data it has learned, and the orchestrator orchestrates the agents.

Agents

See the Basic Usage section to understand how to create KG objects for the agents.

# Define the model
model = OpenAiGenerativeModel("gpt-4o")

# Create the KG from the predefined ontology.
# In this example, we will use the restaurants agent and the attractions agent.
restaurants_kg = KnowledgeGraph(
    name="restaurants",
    ontology=restaurants_ontology,
    model_config=KnowledgeGraphModelConfig.with_model(model),
)
attractions_kg = KnowledgeGraph(
    name="attractions",
    ontology=attractions_ontology,
    model_config=KnowledgeGraphModelConfig.with_model(model),
)


# The following agent is specialized in finding restaurants.
restaurants_agent = KGAgent(
    agent_id="restaurants_agent",
    kg=restaurants_kg,
    introduction="I'm a restaurant agent, specialized in finding the best restaurants for you.",
)

# The following agent is specialized in finding tourist attractions.
attractions_agent = KGAgent(
    agent_id="attractions_agent",
    kg=attractions_kg,
    introduction="I'm an attractions agent, specialized in finding the best tourist attractions for you.",
)

Orchestrator - Multi-Agent System

The orchestrator manages the usage of agents and handles questioning.

# Initialize the orchestrator while giving it the backstory.
orchestrator = Orchestrator(
    model,
    backstory="You are a trip planner, and you want to provide the best possible itinerary for your clients.",
)

# Register the agents that we created above.
orchestrator.register_agent(restaurants_agent)
orchestrator.register_agent(attractions_agent)

# Query the orchestrator.
runner = orchestrator.ask("Create a two-day itinerary for a trip to Rome. Please don't ask me any questions; just provide the best itinerary you can.")
print(runner.output)

Support

Connect with our community for support and discussions. If you have any questions, don’t hesitate to contact us through one of the methods below: