“MultiRound AgentRAG: An Agent-Enhanced Retrieval-Augmented Generation Pipeline for multiround conversation
(This version is usuable but the prompts for the agents are not yet optimized, behavior of the system might not be stable)
We came up with the agentic parts of the project while the non-agentic parts of this work is mostly based on this paper by Fudan university.
For more detailed info, please read the release blog
A detailed usage example in main.ipynb
Setup the environment and install dependancies in the setup.ipynb
To use the pipeline, prepare the PoolOfQueries
object as follows:
pool_of_queries = PoolOfQueries(embedding_function = embedding_function,
rerank_function = rerank_function,
retrieve_function = retrieve_function,
response_function = response_structured_function,
verbose = poq_verbose)
Then during the conversation:
- Pass in the message to update the context, and invoke it to obtain the current retrieved chunks:
pool_of_queries.update(message_history)
current_context_msg = pool_of_queries.current_context_msg()
Prepare the following functions, if they require other arguments, abstract them away (e.g. using partial
to supplement some of the arguments)
- embedding_function:
- Takes in a (list of) text to be embedded(str/ List[str])
- Returns a (list of) tensorlikes (torch.Tensor/ List[torch.Tensor] / ndarray).
- reranking_function:
- Takes in a query (str) and a list of candidates (List[str]) to be ranked
- Returns a list of reranked results, each in the format of [result.text, result.score]
- retrieve_function:
- Takes in a query (str)
- Returns a list of related chunks
- response_function:
- Takes in all chat steps (List[dict], in OAI format), a pydantic datamodel, and (optionally) attribute(s) to retrieve from the datamodel (by default: "response")
- Returns an instance of the specified attribute determined by your datamodel, or a dictionary containing all the attributes if multiple attributes are specified
- poq_verbose:
- Boolean to control verbosity of the poq module
- Not using LLM frameworks (LangChain, llama index), every part should be plug and play
- Retrieved context persistent throughout rounds
- "Sliding Window" retrieval module
- Custom context retrieval/ management mechanism using Agents for multi-round persistence
- Separate the giant notebook into individual files
- Implement source highlighting (equivalently encoding for chunking information)
- Add support different inference providers (local, huggingface etc.)
Because there are (probably) not Miltihop Miltiround RAG benchmarks. (we can't find any as to date.)
Most of the pipeline is made with multiround conversation in mind, so if singleround conversation benchmarks are used, the system reduces to the one outlined in the Fudan paper, and will have similar results.