This repository contains a full Q&A pipeline using LangChain framework, Pinecone as vector database and Tavily as Agent. The data used are the transcriptions of TEDx Talks. A short description of how Tokenizers and Embeddings work is included. Use this Link if the notebook cannot be opened.
The main steps taken to build the RAG pipeline can be summarize as follows:
-
Data Ingestion: load data from csv file
-
Tokenization: how a tokenizer works
-
Embeddgings: how a embeddgings works with cosine similarity concept
-
Indexing: RecursiveCharacterTextSplitter for indexing in chunks
-
Vector Store: Pinecone with several namespace (multi-tenancy)
-
QA Chain Retrieval: RetrievalQA with memory and agents
Feel free to ⭐ and clone this repo 😉
In the initial project phase, the documents are loaded using CSVLoader and indexed. Indexing is a fundamental process for storing and organizing data from diverse sources into a vector store, a structure essential for efficient storage and retrieval. This process involves the following steps:
-
Select a splitting method and its hyperparameters: we will use the RecursiveCharacterTextSplitter.
-
Select the embeddings model: in our case the OpenAI
-
Select a Vector Store: Pinecone.
Storing text chunks along with their corresponding embedding representations, capturing the semantic meaning of the text. These embeddings facilitate easy retrieval of chunks based on their semantic similarity.
After indexing, a QA Chain Retrieval Pipeline is set up in order to check the Q&A functioning and performance. Memory and Agents any are included in the process.
The pipeline created contains the main llm model, memory, the QA chain and the agents. The prompt template is used to complete the QA chain with an slight modification to point out tot he mode to look up first in the Vectorstore.
# Set prompt template
template= '''
Answer the following questions as best you can. You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]. Always look first in Pinecone Document Store
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat 2 times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {input}
Thought:{agent_scratchpad}
'''
prompt = PromptTemplate.from_template(template)
llm = ChatOpenAI(temperature=0.0, model="gpt-3.5-turbo", max_tokens=512)
# Conversational memory
conversational_memory = ConversationBufferWindowMemory(
memory_key='chat_history',
k=5,
return_messages=True)
# Retrieval qa chain
qa_db = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever())
tavily = TavilySearchResults(max_results=10, tavily_api_key=TAVILY_API_KEY)
tools = [
Tool(
name = "Pinecone Document Store",
func = qa_db.run,
description = "Use it to lookup information from the Pinecone Document Store"
),
Tool(
name="Tavily",
func=tavily.run,
description="Use this to lookup information from Tavily",
)
]
agent = create_react_agent(llm,
tools,
prompt)
agent_executor = AgentExecutor(tools=tools,
agent=agent,
handle_parsing_errors=True,
verbose=True,
memory=conversational_memory)
- Different database:
Deep Lake
,Qdrant
, ... - Adding reranker model: Cohere
- Agentic hierarchies with LangGraph