/sticker2

Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot

Primary LanguageRustOtherNOASSERTION

sticker2

Warning: SyntaxDot supersedes sticker2.

Introduction

sticker2 is a sequence labeler using Transformer networks. sticker2 models can be trained from scratch or using pretrained models, such as BERT or XLM-RoBERTa.

In principle, sticker2 can be used to perform any sequence labeling task, but so far the focus has been on:

  • Part-of-speech tagging
  • Morphological tagging
  • Topological field tagging
  • Lemmatization
  • Dependency parsing
  • Named entity recognition

The easiest way to get started with sticker2 is to use a pretrained model.

Features

  • Input representations:
    • Word pieces
    • Sentence pieces
  • Flexible sequence encoder/decoder architecture, which supports:
    • Simple sequence labels (e.g. POS, morphology, named entities)
    • Lemmatization, based on edit trees
    • Dependency parsing
    • Simple API to extend to other tasks
  • Models representations:
    • Transformers
    • Pretraining from BERT and XLM-RoBERTa models
  • Multi-task training and classification using scalar weighting.
  • Model distillation
  • Deployment:
    • Standalone binary that links against PyTorch's libtorch
    • Very liberal license

Status

sticker2 is still under heavy development. However, models are reusable and the API is stable for every y in version 0.y.z.

References

sticker uses techniques from or was inspired by the following papers:

Documentation

Issues

You can report bugs and feature requests in the sticker2 issue tracker.

License

sticker2 is licensed under the Blue Oak Model License version 1.0.0. The list of contributors is also available.

Credits

  • sticker2 is developed by Daniël de Kok & Tobias Pütz.
  • The Python precursor to sticker was developer by Erik Schill.
  • Sebastian Pütz and Patricia Fischer reviewed a lot of code across the sticker projects.