wordpiece
There are 9 repositories under wordpiece topic.
georg-jung/FastBertTokenizer
Fast and memory-efficient library for WordPiece tokenization as it is used by BERT.
stephantul/piecelearn
Learning BPE embeddings by first learning a segmentation model and then training word2vec
danieldk/wordpieces
Split tokens into word pieces
SeonbeomKim/Python-Byte_Pair_Encoding
Byte Pair Encoding (BPE)
Lizhecheng02/Kaggle-LLM-Detect_AI_Generated_Text
Detect whether the text is AI-generated by training a new tokenizer and combining it with tree classification models or by training language models on a large dataset of human & AI-generated texts.
SeanLee97/BertWordPieceTokenizer.jl
WordPiece Tokenizer for BERT models.
Hank-Kuo/go-bert-tokenizer
go-bert-tokenizer
burcgokden/BERT-Subword-Tokenizer-Wrapper
A framework for generating subword vocabulary from a tensorflow dataset and building custom BERT tokenizer models.