Question answering on russian with XLMRobertaLarge as a service. Thanks for the model to Alexander Kaigorodov.
- Flask
- Gunicorn
sudo docker build . --tag qa-roberta-ru-saas
sudo docker run --rm -p 8080:8080 --name qa-roberta-ru-saas qa-roberta-ru-saas
curl -H "Content-Type: application/json" --data @tests/app/data/test_input.json 0.0.0.0:8080/predict
Change device
to cuda:0
in config before docker build:
device: cuda:0
After build:
sudo docker run --rm --gpus 0 -p 8080:8080 --name qa-roberta-ru-saas qa-roberta-ru-saas
To run with restart:
sudo docker run --gpus 0 -p 8080:8080 --restart always --name qa-roberta-ru-saas qa-roberta-ru-saas
To stop it later:
docker update --restart unless-stopped qa-roberta-ru-saas
pytest tests/
PYTHONPATH=. python app/app_main.py
- GPU/CPU support
- Support of context longer than 512 bpe
- Predict on long context with sliding window