subword

There are 15 repositories under subword topic.

  • scarletcho/KoLM

    Korean text normalization and language preparation package for LM in Kaldi-based ASR system

    Language:Python625219
  • zouharvi/tokenization-scorer

    Simple-to-use scoring function for arbitrarily tokenized texts.

    Language:Python46144
  • lallubharteja/KWS-Scripts

    Keyword Search Recipe for Subword ASR

    Language:Shell30259
  • cooelf/subMrc

    Subword-augmented Embedding for Cloze Reading Comprehension (COLING 2018)

    Language:Python15216
  • andreasgrv/johnny

    johnny - a neural network graph based DEPendency Parser

    Language:Python10301
  • cooelf/subword_seg

    Effective Subword Segmentation for Text Comprehension (TASLP 2019)

    Language:C++4304
  • wang-h/FMDL

    Unsupervised Word Segmentation using Minimum Description Length for Neural Machine Translation (NMT)

    Language:C++4321
  • burcgokden/BERT-Subword-Tokenizer-Wrapper

    A framework for generating subword vocabulary from a tensorflow dataset and building custom BERT tokenizer models.

    Language:Python1100
  • explanare/char-iit

    A causal intervention framework to learn robust and interpretable character representations inside subword-based language models

    Language:Jupyter Notebook1100
  • jluo41/NLPText

    Language:Jupyter Notebook1200
  • kkaryl/AI6127-Deep_NLP

    This repository contains source code implementation of assignments for NTU's MSAI course AI6127 on Deep Neural Networks for Natural Language Processing (2019 Sem 2).

    Language:Jupyter Notebook1102
  • scarletcho/subword-mikolov

    An implementation of subword division algorithm proposed in T. Mikolov (2012).

    Language:HTML1111
  • Ishan-Kotian/Tokenizer_NLP

    Tokenization is a way of separating a piece of text into smaller units called tokens. Here, tokens can be either words, characters, or subwords. Hence, tokenization can be broadly classified into 3 types – word, character, and subword (n-gram characters) tokenization.

    Language:Jupyter Notebook0100
  • TiMauzi/dawg

    The concept of DAWGs is based on: Blumer, A. et al. (1985). The smallest automation recognizing the subwords of a text. Theoretical Computer Science, 40, 31–55.

    Language:Java0100
  • Scitator/subword-nmt

    Subword Neural Machine Translation

    Language:Python10