langflow-ai/langflow

Knowledge Graph Integration for LangFlow

Opened this issue · 6 comments

Feature Request

Feature Description:

The integration of Neo4j into LangFlow will bring support for powerful knowledge graph database retrieval capabilities, extending LangFlow's versatility in handling diverse data storage solutions. Neo4j, Nebula Graph, and FalcorDB are leading Knowledge Graph Databases, that can serve as both a vector database and a knowledge graph using Cypher queries. This feature will mirror existing integrations like Redis and Pinecone, empowering users to efficiently leverage graph structures for retrieval tasks.

https://python.langchain.com/v0.2/docs/integrations/graphs/neo4j_cypher/
https://python.langchain.com/docs/integrations/providers/neo4j
https://python.langchain.com/docs/integrations/vectorstores/neo4jvector

Motivation

Why This is Essential:

  1. Graph Databases as Data Retrievers: The ability to retrieve data from knowledge graph databases opens new avenues for complex, interconnected data queries. KGDB structure allows for highly sophisticated data relationships, offering more flexible and insightful retrieval than traditional databases.

  2. Unlocking Knowledge Graph Potential: Knowledge Graph Databases such as Neo4j, FalkorDB, and Nebula Graph are becoming crucial in industries where understanding relationships between data points is key—such as recommendation systems, fraud detection, and semantic search. Integrating these with LangFlow will allow users to query graph data with ease and precision, using graph-based algorithms.

  3. Dual Functionality with Vector Databases: Neo4j, Bebula Graph, and FalnorDB can also act as a vector database, adding to LangFlow's capacity to handle vectorized data alongside traditional knowledge graph structures. This flexibility allows users to seamlessly move between vector searches (e.g., for similarity) and more intricate relationship-based queries via Cypher, a powerful query language.

  4. Aligning with LangChain: LangChain, a recognized leader in the space, has already integrated Neo4j for both vector store and graph database functionalities. Aligning LangFlow with LangChain ensures that LangFlow remains competitive and compatible with the broader AI and machine learning ecosystem. Users will appreciate the continuity of experience and the shared ecosystem between these tools.

  5. Extending LangFlow’s Reach: The integration will expand LangFlow's use cases, making it more appealing to sectors that rely on graph-based data representation and advanced knowledge modeling. Neo4j’s integration would create a path for future inclusion of other graph databases like FalkorDB and Nebula Graph, thereby creating a wider, more flexible infrastructure for LangFlow users.

Business Impact:

By enabling Neo4j, Nebula Graph, FalcorDB and other Knowledge Graph Databases, LangFlow will become more attractive to industries focused on complex data interrelationships, ensuring broader adoption and deeper penetration into sectors like fintech, healthcare, cybersecurity, and recommendation engines.

Your Contribution

No response

Hey @qdrddr! I'm here to assist you with any questions, bugs, or contributions. Let's tackle this together!

I found a similar closed issue regarding the integration of Criteria Evaluation with LangFlow. The suggested solution involved creating a custom component for Criteria Evaluation, which can be adapted for integrating Neo4j as well [1].

To integrate Neo4j into LangFlow for enhancing its capabilities for knowledge graph database retrieval and vector database functionalities, you can follow these steps:

  1. Create a Neo4j Vector Store Component:
    Define a new component class for Neo4j, similar to the HCDVectorStoreComponent.
from langflow.base.vectorstores.model import LCVectorStoreComponent, check_cached_vector_store
from langflow.inputs import StrInput, SecretStrInput, MultilineInput, IntInput, DropdownInput, FloatInput, DictInput, DataInput
from langflow.schema import Data

class Neo4jVectorStoreComponent(LCVectorStoreComponent):
    display_name: str = "Neo4j"
    description: str = "Implementation of Vector Store using Neo4j with search capabilities"
    documentation: str = "https://neo4j.com/docs/"
    name = "Neo4j"
    icon: str = "Neo4j"

    inputs = [
        StrInput(
            name="database_name",
            display_name="Database Name",
            info="The name of the Neo4j database.",
            required=True,
        ),
        StrInput(
            name="username",
            display_name="Neo4j Username",
            info="Authentication username for accessing Neo4j.",
            required=True,
        ),
        SecretStrInput(
            name="password",
            display_name="Neo4j Password",
            info="Authentication password for accessing Neo4j.",
            required=True,
        ),
        StrInput(
            name="uri",
            display_name="Neo4j URI",
            info="URI for the Neo4j service.",
            required=True,
        ),
        MultilineInput(
            name="search_input",
            display_name="Search Input",
        ),
        DataInput(
            name="ingest_data",
            display_name="Ingest Data",
            is_list=True,
        ),
        IntInput(
            name="number_of_results",
            display_name="Number of Results",
            info="Number of results to return.",
            advanced=True,
            value=4,
        ),
        DropdownInput(
            name="search_type",
            display_name="Search Type",
            info="Search type to use",
            options=["Similarity", "Similarity with score threshold", "MMR (Max Marginal Relevance)"],
            value="Similarity",
            advanced=True,
        ),
        FloatInput(
            name="search_score_threshold",
            display_name="Search Score Threshold",
            info="Minimum similarity score threshold for search results. (when using 'Similarity with score threshold')",
            value=0,
            advanced=True,
        ),
        DictInput(
            name="search_filter",
            display_name="Search Metadata Filter",
            info="Optional dictionary of filters to apply to the search query.",
            advanced=True,
            is_list=True,
        ),
    ]

    @check_cached_vector_store
    def build_vector_store(self):
        try:
            from neo4j import GraphDatabase
        except ImportError:
            raise ImportError(
                "Could not import neo4j package. Please install it with `pip install neo4j`."
            )

        driver = GraphDatabase.driver(self.uri, auth=(self.username, self.password))
        return driver

    def _add_documents_to_vector_store(self, driver):
        documents = []
        for _input in self.ingest_data or []:
            if isinstance(_input, Data):
                documents.append(_input.to_lc_document())
            else:
                raise ValueError("Vector Store Inputs must be Data objects.")
        if documents:
            with driver.session() as session:
                for doc in documents:
                    session.run("CREATE (n:Document {content: $content})", content=doc.content)
        else:
            logger.debug("No documents to add to the Vector Store.")

    def _map_search_type(self):
        if self.search_type == "Similarity with score threshold":
            return "similarity_score_threshold"
        elif self.search_type == "MMR (Max Marginal Relevance)":
            return "mmr"
        else:
            return "similarity"

    def _build_search_args(self):
        args = {
            "k": self.number_of_results,
            "score_threshold": self.search_score_threshold,
        }

        if self.search_filter:
            clean_filter = {k: v for k, v in self.search_filter.items() if k and v}
            if len(clean_filter) > 0:
                args["filter"] = clean_filter
        return args

    def search_documents(self) -> list[Data]:
        driver = self.build_vector_store()

        logger.debug(f"Search input: {self.search_input}")
        logger.debug(f"Search type: {self.search_type}")
        logger.debug(f"Number of results: {self.number_of_results}")

        if self.search_input and isinstance(self.search_input, str) and self.search_input.strip():
            try:
                search_type = self._map_search_type()
                search_args = self._build_search_args()

                with driver.session() as session:
                    result = session.run("MATCH (n:Document) WHERE n.content CONTAINS $query RETURN n LIMIT $limit", query=self.search_input, limit=self.number_of_results)
                    docs = [record["n"] for record in result]
            except Exception as e:
                raise ValueError(f"Error performing search in Neo4j: {str(e)}") from e

            logger.debug(f"Retrieved documents: {len(docs)}")
            data = docs_to_data(docs)
            logger.debug(f"Converted documents to data: {len(data)}")
            self.status = data
            return data
        else:
            logger.debug("No search input provided. Skipping search.")
            return []

    def get_retriever_kwargs(self):
        search_args = self._build_search_args()
        return {
            "search_type": self._map_search_type(),
            "search_kwargs": search_args,
        }
  1. Install Required Packages:
    Ensure you have the necessary packages installed:

    pip install neo4j
  2. Update LangFlow Configuration:
    Add the new Neo4jVectorStoreComponent to the LangFlow configuration to make it available in the UI.

By following these steps, you can integrate Neo4j into LangFlow, allowing it to leverage Neo4j's capabilities for knowledge graph database retrieval and vector database functionalities [2].

To continue talking to Dosu, mention @dosu.

Your competitor RAGflow has already added GraphRAG support

https://github.com/infiniflow/ragflow/tree/v0.9.0

Here is why knowledge graphs are strategically important for AI & RAG. Please vote for this task.
image

adivoj commented

Your competitor RAGflow has already added GraphRAG support

https://github.com/infiniflow/ragflow/tree/v0.9.0

No, not across documents. Just within one doc.

qdrddr commented

Flowise also added Neo4j/GraphRAG already in their 2.2.3 release.
would be great to have it in LangFlow too..

qdrddr commented

Here someone already created plugins for LangFlow at the https://www.langflow.store/store I think it could be a good start.