- 0. Motivation
- 1. Demo videos for Setup and tutorial
- 2. Setup
- 3. Start the app
- 4. How to use
- 5. Checking the references
- 6. Contribution
The RAG-ArXiv LLM Research Assistant project aims to create an intelligent system that scrapes recent Language Model (LLM) research papers from ArXiv, embeds them, and stores them in a vector database. This setup enables the system to rank and answer LLM-related questions using up-to-date information from the latest research.
final.set.up.mov
tutorial.mp4
Before you begin, ensure you have the following installed on your machine:
Set up Ollama, a powerful local language model processing tool. Visit their GitHub to pull the Llama3.1 8B model to your local machine, or follow these steps:
After downloading Ollama, run this command in your terminal:
ollama pull llama3.1
To use a different LLM, select any model from the Ollama website, then navigate to chat_core/global_var.py.py
and change LLM_NAME = "llama3.1"
to your preferred model.
Download Python from https://www.python.org/downloads/. Python 3.10 is recommended for this app.
Run the following commands in your terminal:
git clone https://github.com/dohoanggiahuy317/RAG-ArXiv-LLM-Research-Assistant-Proj.git
cd RAG-ArXiv-LLM-Research-Assistant
OPTION 1: If you choose to download the provided virtual environment here, activate it using conda or python, I'm using conda here:
conda deactivate
conda activate ./.env
OPTION 2: Set up a new virtual environment (you can use conda) with python 3.10, activate it, then install the required packages:
You can use the following command to create a virtual environment using conda
conda create --prefix .env python=3.10
conda deactivate
conda activate ./.env
Then, pip install packages
pip install -r requirements.txt
To launch the application, run:
python app.py
The application will be accessible at http://127.0.0.1:5000/ in your browser.
The application follows these steps:
- Fetch data from ArXiv
- (Optional) Fine-tune your own embedder
- Save papers into your local vector database
- Save the model configuration for chatting
- Create username
(You must complete all 5 steps before you can start chatting)
Skip steps 1, 2, and 3 by downloading the prepared data and model:
- data: Save this folder into the root directory
- database: Save this folder into the root directory
- models: Save this folder under the
finetune_embedder
folder - data_embedder: Save this folder under the
finetune_embedder
folder
Fetch new Natural Language Processing papers from ArXiv. Adjust parameters in the window. Already scraped papers will be skipped to avoid duplicates.
Fine-tune your own retrieval model to improve the paper reference engine. Enter your chosen embedder name when prompted.
Save documents to the database using your selected embedder. Choose between the default Hugging Face embedder or any custom embedder you prefer.
Save the model configuration for chatting. Select the vector database to query and the compressed document type. LLM Filter is generally more efficient for summarizing long text.
- LLM Extract: Iterates over initially returned documents and extracts only content relevant to the query.
- LLM Filter: A simpler but more robust compressor that uses an LLM chain to filter out irrelevant documents without manipulating their contents.
Enter your username to start chatting. Access previous chats by using the same username.
All responses are referenced based on actual research papers. Citations are shown when the app first provides an answer. You can always find them in chat_core/logs
using your username, conversation ID, and compression type.
To explore the database contents, use MySQLWorkbench or your preferred SQL tool. Connect to the database file located at chat_core/database/memory.db
to query and explore the data.
We welcome contributions! If you have suggestions or improvements, please open an issue or submit a pull request on our GitHub repository.