subword-segmentation
There are 11 repositories under subword-segmentation topic.
aalto-speech/morfessor
Morfessor is a tool for unsupervised and semi-supervised morphological segmentation
aalto-speech/flatcat
Morfessor FlatCat
Waino/morfessor-emprune
Morfessor EM+Prune
majeek/vml-hd
Parsing and subword segmentation code for the VML-HD Dataset
BassaniRiccardo/ICEBERT
ICEBERT: Interlingual-Clusters Enhanced BERT. A BERT-like model trained on clusters of monolingual subwords.
Waino/morfessor-cognates
Cognate-aware morphological segmentation
aalto-speech/morfessor-emprune
Morfessor EM+Prune
dlsucelt/Cellar
Central repository with pretrained models for transfer learning, BPE subword-tokenization, mono/multilingual embeddings, and everything in between.
TiMauzi/dawg
The concept of DAWGs is based on: Blumer, A. et al. (1985). The smallest automation recognizing the subwords of a text. Theoretical Computer Science, 40, 31–55.
aalto-speech/morfessor-demo
Morfessor demonstration
JoyeBright/FT-IWSLT2014-BPEVocab
Repository for the experiments in my paper: "A Systematic Analysis of Vocabulary and BPE Settings for Optimal Fine-tuning of NMT: A Case Study of In-domain Translation "