/sparql-langchain

A project that uses Langchain to construct SPARQL queries against SPARQL endpoints

Primary LanguagePythonMIT LicenseMIT

This project aims to create a general purpose LangChain component to perform queries over large SPARQL endpoints. It derives from the existing GraphSPARQLQAChain, but aims to address some shortcomings including dealing with very large schemas, improved schema guidance, and better quality answers for existing SPARQL endpoints such as Bio2RDF and Wikidata.

Current status:

  • This project is in pre-alpha and is demonstrative of the approach.

Future work:

  • Refactor the context generation to remove hardcoding of bio2rdf schema and instructions.
  • Be able to specify a target SPARQL endpoint to query against.
  • Generate, store, and load an RDF schema for a SPARQL endpoint.
  • Identify a relevant fragment of the schema to guide the construction.
  • Use other LLMs such as LLAMA2 instead of OpenAI GPT
  • Iterate until a valid SPARQL query can be generated
  • Implement a conversational AI to i) improve query answering and ii) support human feedback reinforcement learning.

Install

Create and activate virtual env

python -m venv .venv
source .venv/bin/activate

Install

pip install -e .

Set environment for OPENAI API Key, optionally for Lanchain API for langsmith

OPENAI_API_KEY=
LANGCHAIN_API_KEY=

Run

python src/query.py