/zkgpt

Building your GPT-based private knowledge hub using zkSNARK & PLONK

Primary LanguageCircomMIT LicenseMIT

zkGPT

zkGPT is an open-source toolkit for constructing private GPT-based knowledge hubs by leveraging Langchain and zkSNARK. It allows the creation of ChatGPT-like interfaces while preserving user and contributor anonymity.

As its core, the project requires data miners who run the zkGPT node and provide necessary resources such as data storage and processing power. In fact, they can access the content provided by users and contributors but they will never know the identities of those individuals that are shielded within the Circom circuits during the pre-query and pre-upload stages.

Background

With ChatGPT and LLM models, we can inquire about specific topics by providing context before making the inquiry. For example, if we ask "What is Bitcoin price today? The AI model may response with "I can't provide the answer". However, if we supply the necessary context such as "The current Bitcoin price is $25,000 and then ask the question again, the model will return the Bitcoin price.

Because of the generative AI models have the ability to retain information and context from previous interactions with its long-term memory. When we provide specific context or information and ask a related question again, the model can then access and utilize the stored information to provide an accurate response.

By applying this concept, there is a huge potential for streamlining tasks within organizations or teams. However, sharing context especially sensitive information, may expose it to external parties or individuals operating the generative models.

How It Works

5

The system uses a hybrid model in which all documents are encrypted and uploaded to the document database within the zkGPT node that can be running by untrusted individuals.

The document commitment is generated by taking the wallet address, the hash of the document and the hub password set by the hub admin within the arithmetic circuits then attaching the output to the smart contract. If a contributor provides the wrong password, the proof won't be generated and the document will not be attached to the knowledge hub.

On the query side, users can ask any question within the knowledge hub, the prompt will be sent to the zkGPT node for further procesing along with the proof. If the proof is valid, the zkGPT node retrieves all documents from the knowledge hub, loads them into the in-memory vector store, and performs the query. The results are posted to the database using the wallet address hashed with Poseidon hash as an ID.

The proof on the query side is generated by considering all the prompts and results associated with the given account. Thus, to minimize the trusted ceremony's size and resource consumption, each account is allowed to ask up to 5 questions only.

Repository structure

The project using a monorepo structure consists of 4 different packages that have been interconnected using Lerna.

  • client: the frontend application made with Next.js and TailwindCSS, incorporates the necessary Web3 and ZK libraries.
  • contracts: contains Circom's circuits and a zkGPT.sol main contract that inherits the autogenerated verifier contracts made from snarkjs. All documents and hubs must be hashed using Poseidon hash and imported and verified within the contracts.
  • lib: source code for zkGPT node, separate for better unit and integration tests with other package.
  • node: the zkGPT node contains the document database implemented using PouchDB, processes all AI-related operations and also incorporates Langchain for parsing documents in and out within LLM models.

Getting started

All Circom circuits have been compiled and placed at /packages/contracts/circuits folder, you can follow the instructions provided at this link to manually compile them.

Environmental variables

Before starting install dependencies, you may need to set enviroment variables on the package node

OPENAI_API_KEY=

Note that the OpenAI's API is used for text embeddings. We're considering supporting more options in the future, including TensorFlow.

Install dependencies

Since we are using Lerna, all dependencies across all packages can be installed by running following command at the root folder:

npm run bootstrap

Most of the core libraries that the project is using require Node.js 18. Make sure you have it installed, otherwise, erros may occur.

Tests

After installing all dependencies, you can run the unit and integration tests to verify that all packages are working as intended.

npm run package:lib

Then open another console and run:

npm run test

Deploy smart contracts

By default, the frontend is initially configured to connect to the smart contract that we have deployed on BNB Testnet chain.

However, if you wish to deploy a new contract, you can follow these steps.

cd packages/contracts
npm run deploy

When the process is complete, the console should display the contract address, also ensure that the private key has been set in the .env file.

And you will need to replace the new contract address in packages/client/src/constants.js

export const contractAddress = "0xd074fEDb0E82bBDC91CD032719D0F9549796521c" // replace with your contract here

Run

When everything is set, you can start the entire system by running the command:

npm start

This will run a zkGPT node on port 8000 and the frontend on port 3000. Visit http://localhost:3000 to start using it.

You can check out the YouTube video on how to create the knowledge hub and perform queries.

The Docker configuration file and AWS ECS cluster deployment script are included but detailed instruction are not provided here.

Deployment

BNB Testnet

Contract Name Contract Address
zkGPT 0xd074fEDb0E82bBDC91CD032719D0F9549796521c

License

This project is licensed under the MIT License