Research Collections 📖

A Streamlit web app, that leverages the capabilities of Large Language Models (LLMs) to perform simple reasoning tasks. The app revolves around the concept of research collections which are curated sets of paper metadata obtained from ArXiv. Users can interact with these collections to explore the research landscape, identify emerging trends, and receive personalized paper recommendations.

Overview

Make the research process more efficient and enjoyable. Uses LLMs agents to assist you in finding relevant literature. The app is divided into four sections:

Keyword Generation: Generate highly relevant keywords for your research. This is a two-step process. First, an LLM is prompted to generate a small list of relevant keywords, using its own internal parameters. This list is good, but tends to be repetitive and usually lacks in-domain knowledge. To refine the list, the app downloads from ArXiv a small sample of paper titles. The titles are then fed to the LLM, which uses them to generate a new list of refined keywords, witch now contain domain-specific knowledge.

Research collections: Download papers from ArXiv using the refined keywords. The abstracts are vectorized with the bge-small-en-v1.5 embedding model and are stored in a Milvus vector database.

Research landscape: Use BERTopic to identify the underlying topics in your research collection. Use the powerful visualizations to identify research topics, outliers, and new trends in the field.

Paper Scoring: Explore your research collections using Retrival Augmented Generation. Ask questions in natural language and let an LLM identify which papers are a must-read.

Getting Started

To use the streamlit app, you will need an OpenAI developer account and an API key. By default, the app uses gpt-3.5-turbo LLM model.

You will also need to install Docker and Docker Compose in order to run Milvus.

Option 1: Clone the Repository

Install Poetry

Poetry is a tool for dependency management and packaging in Python. It uses the pyproject.toml file to manage dependencies and build the package.

Install Milvus

Milvus is an open-source vector database that provides state-of-the-art similarity search.

Download Milvus from here and follow the instructions to install it the latest version. It requires Docker and Docker Compose.

Clone this GitHub repository

git clone https://github.com/lgarma/research-assistant.git

Set up the virtual environment with poetry

cd research-assistant
poetry install
poetry shell

Set your OpenAI API key

echo "OPENAI_API_KEY=your-api-key" > .env

Initiate the streamlit app

poetry run streamlit run app/01_📖_Research_collection.py

Option 2: Docker Installation

If you prefer to use Docker, you can set up Research Assistant as follows:

Clone this GitHub repository

git clone https://github.com/lgarma/research-assistant.git

Set your OpenAI API key

echo "OPENAI_API_KEY=your-api-key" > .env

Build and run the docker container

docker-compose up --build -d

Open the streamlit app

The streamlit app should be running on http://localhost:8502

nguyentran0212/research-assistant

Research Collections 📖

Overview

Getting Started

Option 1: Clone the Repository

Option 2: Docker Installation