ugramm.8052.ru | Swagger API Docs
cd service && docker-compose up
On Ubuntu 20.04
PREFIX=firefish # update docker-compose.yml if changed
sudo apt install -y python3 python3-pip libmecab-dev
pip install flake8 pytest
pip install -r service/backend/requirements.txt
pytest
cd service
cd third_party && docker build -t "$PREFIX/languagetool:5.7.0" . && cd ..
cd backend && docker build -t "$PREFIX/ugramm:0.2" . && cd ..
docker-compose up
See tests at service/backend/test
The task is to suggest where to place commas in a given text.
- Grazie
- Grammarly
- LanguageTool | Standalone version
- NVIDIA NeMo | Standalone version
- FullStop | Standalone version
There are a lot of domains when we are talking about text data:
- prose
- encyclopedia articles
- subtitles
- emails
- technical documentation
- instant messages
- ... etc.
All these domains have different properties. And it's hard to develop a system which performs equally well on all of them. Let's select 3 datasets which could be biased towards different domains:
- Tatoeba - english texts from books/songs/etc. translated to Japanese
- Project Gutenberg in LibriSpeech - books
- Technical documentation for JetBrains and Microsoft products
Punctuation quality could be measured in many ways:
- Accuracy
- Recall/Precision/F1
- ROC-AUC
On different scales:
- sub-word/token level
- word level
- sentence level
Let's use word level F1-score for all comparisons.
See model training in quality/models
Graphs at Weights & Biases
Model \ Test set | Tatoeba | OSS Docs | Gutenberg | Total |
---|---|---|---|---|
NeMo BERT | 0.805 | 0.701 | 0.636 | 0.655 |
FullStop | 0.757 | 0.723 | 0.623 | 0.648 |
Ours DistilRoBERTa | 0.846 | 0.824 | 0.826 | 0.827 |
Ours RoBERTa | 0.868 | 0.851 | 0.864 | 0.862 |
Table 1. Word level F1-scores
More details in notebooks: quality/evaluation