subword
There are 15 repositories under subword topic.
scarletcho/KoLM
Korean text normalization and language preparation package for LM in Kaldi-based ASR system
zouharvi/tokenization-scorer
Simple-to-use scoring function for arbitrarily tokenized texts.
lallubharteja/KWS-Scripts
Keyword Search Recipe for Subword ASR
cooelf/subMrc
Subword-augmented Embedding for Cloze Reading Comprehension (COLING 2018)
andreasgrv/johnny
johnny - a neural network graph based DEPendency Parser
cooelf/subword_seg
Effective Subword Segmentation for Text Comprehension (TASLP 2019)
wang-h/FMDL
Unsupervised Word Segmentation using Minimum Description Length for Neural Machine Translation (NMT)
burcgokden/BERT-Subword-Tokenizer-Wrapper
A framework for generating subword vocabulary from a tensorflow dataset and building custom BERT tokenizer models.
explanare/char-iit
A causal intervention framework to learn robust and interpretable character representations inside subword-based language models
kkaryl/AI6127-Deep_NLP
This repository contains source code implementation of assignments for NTU's MSAI course AI6127 on Deep Neural Networks for Natural Language Processing (2019 Sem 2).
scarletcho/subword-mikolov
An implementation of subword division algorithm proposed in T. Mikolov (2012).
Ishan-Kotian/Tokenizer_NLP
Tokenization is a way of separating a piece of text into smaller units called tokens. Here, tokens can be either words, characters, or subwords. Hence, tokenization can be broadly classified into 3 types – word, character, and subword (n-gram characters) tokenization.
TiMauzi/dawg
The concept of DAWGs is based on: Blumer, A. et al. (1985). The smallest automation recognizing the subwords of a text. Theoretical Computer Science, 40, 31–55.
Scitator/subword-nmt
Subword Neural Machine Translation