/search-with-dense-vectors

Final project for course on deep learning for nlp (IA376E/1s2020 @ Unicamp)

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Search with dense vectors

Open In Colab License

Final project for course on deep learning for nlp (IA376E/1s2020 @ Unicamp). This is an implementation of a Two Tower model for solving the problem of document retrieval (and passage ranking) in the dataset MSMarco. The project also uses queries generated using doc2query algotithm. The project is implemented using PyTorch and PyTorch Lighning, deep learning frameworks for Python.

Docs (portuguese)

The final article and the plan of work can be found in docs/.

Usage

One can import the model in python or use as a script.

Training

Example of training using model as module:

from src.model import TwoTower
from pytorch_lightning import Trainer

model = TwoTower(**model_args)

trainer = Trainer(**trainer_args)
trainer.fit(model)

Example of training using train script:

   python -m src.train --gpus 1 --batch_size 32

There's also a colab notebook showing the usage in notebooks/train.ipynb and notebooks/example.ipynb.

References