
A multi-stage neural search engine for the COVID-19 Open Research Dataset

Primary LanguageTypeScriptMIT LicenseMIT

Covidex: A Search Engine for the COVID-19 Open Research Dataset

Build Status LICENSE

This repository contains the API server, neural models, and UI client for Covidex, a neural search engine for the COVID-19 Open Research Dataset (CORD-19).

For a description of our system, check out this paper: Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset.

Local Deployment

API Server

Install CUDA 10.1

  • For Ubuntu, follow these instructions
  • For Debian run sudo apt-get install nvidia-cuda-toolkit

Install Anaconda (currently version 2020.02)

wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
bash Anaconda3-2020.02-Linux-x86_64.sh

Install Java 11

sudo apt-get install openjdk-11-jre openjdk-11-jdk

Build the latest Anserini indices

sh scripts/update-anserini-index.sh

Build the latest HNSW index for related article search

sh scripts/update-hnsw-index.sh

Set up environment variables by copying over the defaults and modifying as needed

cp api/.env.sample api/.env

Create an Anaconda environment for Python 3.7

conda create -n covidex python=3.7

Activate the Anaconda environment

conda activate covidex

Install Python dependencies

pip install -r api/requirements.txt

Run the server (make sure you are in the api/ folder)

uvicorn app.main:app --reload --port=8000

The server wil be running at localhost:8000 with API documentation at /docs

UI Client

Install Node.js 12+ and Yarn.

Install dependencies

yarn install

Start the server

yarn start

The client will be running at localhost:3000

Production Deployment

Redirect port 80 to specified port since only root can bind to port 80 (the below command uses port 8000):

sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8000

Build the latest Anserini indices

sh scripts/update-anserini-index.sh [DATE]

Build the latest HNSW index for related article search

sh scripts/update-hnsw-index.sh

Start the server (deploys to port 8000 by default):

sh scripts/deploy-prod.sh

Optional: set the environment variable $PORT:

PORT=8000 sh scripts/deploy-prod.sh

Log files are available under api/logs, where new files are created daily based on UTC time. All filenames have the date appended except for the current one, which will be named search.log or related.log.


To run all API tests

TESTING=true pytest api

How do I cite this work?

  title={Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset},
  author={Zhang, Edwin and Gupta, Nikhil and Tang, Raphael and Han, Xiao and Pradeep, Ronak and Lu, Kuang and Zhang, Yue and Nogueira, Rodrigo and Cho, Kyunghyun and Fang, Hui and others},
  journal={arXiv preprint arXiv:2007.07846},