Insight

👁️👄👁️ Natural language document search. Given a topic query, find the n most similar documents.
A DistilBERT or SciBERT model is used to embed the query and the documents.

🏡 Getting Started

To run, first install the requirements in your virtual environment:

pip install -r requirements.txt

Then run streamlit run app.py, type in your query, and hit cmd/ctrl+enter.

Alternatively, you can use the manifest and Procfile to push to your PaaS platform.

You'll need the metadata (metadata.json) and embedding (doctensor.pt) files. Ask me :)

🔗 Links

DistilBERT model taken from this 🤗 Hugging Face repo.

SciBERT model taken from this 🤗 Hugging Face repo.

✔️ TODO

Improve retrieval performance

Classify into categories

Add active learning classification step. (Important)

Other

Add year filters.
Deploy to prod.
Add minimum word count (~100 = 85% of abstracts).
Add spark lines.

logangraham/insight

Insight

🏡 Getting Started

🔗 Links

✔️ TODO