DeepPavlov: A Python repository from usmc2033

DeepPavlov is an open-source conversational AI library built on PyTorch.

DeepPavlov is designed for

development of production ready chat-bots and complex conversational systems,
research in the area of NLP and, particularly, of dialog systems.

Quick Links

Demo demo.deeppavlov.ai
Documentation docs.deeppavlov.ai
- Model List docs:features/
- Contribution Guide docs:contribution_guide/
Issues github/issues/
Forum forum.deeppavlov.ai
Blogs medium.com/deeppavlov
Extended colab tutorials
Docker Hub hub.docker.com/u/deeppavlov/
- Docker Images Documentation docs:docker-images/

Please leave us your feedback on how we can improve the DeepPavlov framework.

Models

Named Entity Recognition | Intent/Sentence Classification |

Question Answering over Text (SQuAD) | Knowledge Base Question Answering

Syntactic Parsing | Morphological Tagging

Automatic Spelling Correction | Entity Extraction

Open Domain Questions Answering | Russian SuperGLUE

Relation Extraction

Embeddings

BERT embeddings for the Russian, Polish, Bulgarian, Czech, and informal English

ELMo embeddings for the Russian language

FastText embeddings for the Russian language

Auto ML

Tuning Models

Integrations

REST API | Socket API

Amazon AWS

Installation

DeepPavlov supports Linux, Windows 10+ (through WSL/WSL2), MacOS (Big Sur+) platforms, Python 3.6, 3.7, 3.8, 3.9 and 3.10. Depending on the model used, you may need from 4 to 16 GB RAM.
Create and activate a virtual environment:
- Linux
```
python -m venv env
source ./env/bin/activate
```
Install the package inside the environment:
```
pip install deeppavlov
```

QuickStart

There is a bunch of great pre-trained NLP models in DeepPavlov. Each model is determined by its config file.

List of models is available on the doc page in the deeppavlov.configs (Python):

from deeppavlov import configs

When you're decided on the model (+ config file), there are two ways to train, evaluate and infer it:

via Command line interface (CLI) and
via Python.

GPU requirements

By default, DeepPavlov installs models requirements from PyPI. PyTorch from PyPI could not support your device CUDA capability. To run supported DeepPavlov models on GPU you should have CUDA compatible with used GPU and PyTorch version required by DeepPavlov models. See docs for details. GPU with Pascal or newer architecture and 4+ GB VRAM is recommended.

Command line interface (CLI)

To get predictions from a model interactively through CLI, run

python -m deeppavlov interact <config_path> [-d] [-i]

-d downloads required data - pretrained model files and embeddings (optional).
-i installs model requirements (optional).

You can train it in the same simple way:

python -m deeppavlov train <config_path> [-d] [-i]

Dataset will be downloaded regardless of whether there was -d flag or not.

To train on your own data you need to modify dataset reader path in the train config doc. The data format is specified in the corresponding model doc page.

There are even more actions you can perform with configs:

python -m deeppavlov <action> <config_path> [-d] [-i]

<action> can be
- install to install model requirements (same as -i),
- download to download model's data (same as -d),
- train to train the model on the data specified in the config file,
- evaluate to calculate metrics on the same dataset,
- interact to interact via CLI,
- riseapi to run a REST API server (see doc),
- predict to get prediction for samples from stdin or from <file_path> if -f <file_path> is specified.
<config_path> specifies path (or name) of model's config file
-d downloads required data
-i installs model requirements

Python

To get predictions from a model interactively through Python, run

from deeppavlov import build_model

model = build_model(<config_path>, install=True, download=True)

# get predictions for 'input_text1', 'input_text2'
model(['input_text1', 'input_text2'])

where

install=True installs model requirements (optional),
download=True downloads required data from web - pretrained model files and embeddings (optional),
<config_path> is model name (e.g. 'ner_ontonotes_bert_mult'), path to the chosen model's config file (e.g. "deeppavlov/configs/ner/ner_ontonotes_bert_mult.json"), or deeppavlov.configs attribute (e.g. deeppavlov.configs.ner.ner_ontonotes_bert_mult without quotation marks).

You can train it in the same simple way:

from deeppavlov import train_model 

model = train_model(<config_path>, install=True, download=True)

To train on your own data you need to modify dataset reader path in the train config doc. The data format is specified in the corresponding model doc page.

You can also calculate metrics on the dataset specified in your config file:

from deeppavlov import evaluate_model 

model = evaluate_model(<config_path>, install=True, download=True)

DeepPavlov also allows to build a model from components for inference using Python.

License

DeepPavlov is Apache 2.0 - licensed.