/research-assistant

Use LLM to help you in your reasearch

Primary LanguagePython

Research Collections 📖

A Streamlit web app, that leverages the capabilities of Large Language Models (LLMs) to perform simple reasoning tasks. The app revolves around the concept of research collections which are curated sets of paper metadata obtained from ArXiv. Users can interact with these collections to explore the research landscape, identify emerging trends, and receive personalized paper recommendations.


Overview

Make the research process more efficient and enjoyable. Uses LLMs agents to assist you in finding relevant literature. The app is divided into four sections:

  • Keyword Generation: Generate highly relevant keywords for your research. This is a two-step process. First, an LLM is prompted to generate a small list of relevant keywords, using its own internal parameters. This list is good, but tends to be repetitive and usually lacks in-domain knowledge. To refine the list, the app downloads from ArXiv a small sample of paper titles. The titles are then fed to the LLM, which uses them to generate a new list of refined keywords, witch now contain domain-specific knowledge.

keyword

  • Research collections: Download papers from ArXiv using the refined keywords. The abstracts are vectorized with the bge-small-en-v1.5 embedding model and are stored in a Milvus vector database.

keyword

  • Research landscape: Use BERTopic to identify the underlying topics in your research collection. Use the powerful visualizations to identify research topics, outliers, and new trends in the field.

keyword topic over time

  • Paper Scoring: Explore your research collections using Retrival Augmented Generation. Ask questions in natural language and let an LLM identify which papers are a must-read.

paper recommendations


Getting Started

To use the streamlit app, you will need an OpenAI developer account and an API key. By default, the app uses gpt-3.5-turbo LLM model.

You will also need to install Docker and Docker Compose in order to run Milvus.

Option 1: Clone the Repository

  1. Install Poetry

Poetry is a tool for dependency management and packaging in Python. It uses the pyproject.toml file to manage dependencies and build the package.

  1. Install Milvus

Milvus is an open-source vector database that provides state-of-the-art similarity search.

Download Milvus from here and follow the instructions to install it the latest version. It requires Docker and Docker Compose.

  1. Clone this GitHub repository
git clone https://github.com/lgarma/research-assistant.git
  1. Set up the virtual environment with poetry
cd research-assistant
poetry install
poetry shell
  1. Set your OpenAI API key
echo "OPENAI_API_KEY=your-api-key" > .env
  1. Initiate the streamlit app
poetry run streamlit run app/01_📖_Research_collection.py

Option 2: Docker Installation

If you prefer to use Docker, you can set up Research Assistant as follows:

  1. Clone this GitHub repository
git clone https://github.com/lgarma/research-assistant.git
  1. Set your OpenAI API key
echo "OPENAI_API_KEY=your-api-key" > .env
  1. Build and run the docker container
docker-compose up --build -d
  1. Open the streamlit app

The streamlit app should be running on http://localhost:8502