tokenisation
There are 13 repositories under tokenisation topic.
alasdairforsythe/tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
SmartTokenLabs/TokenScript
TokenScript schema, specs and paper
flammie/omorfi
Open morphology for Finnish
checkout/frames-ios
Frames iOS: making native card payments simple
checkout/frames-android
Frames Android: making native card payments simple
casics/spiral
A Python 3 module that provides functions for splitting identifiers found in source code files.
andreihar/taibun
Taiwanese Hokkien Transliterator and Tokeniser
darshank15/wikipedia-search-engine
Built a complete search engine by creating an Inverted Index on the Wikipedia corpus ( of 2018 with size 72 GB). That gives you top search result related to given query words.
andreihar/taibun.js
Taiwanese Hokkien Transliterator and Tokeniser
kbnim/Letteriser
A tiny utility that takes a string and decomoposes it to the letters of the Hungarian alphabet.
DataRish/MBTI-Personality-Predictor
This project predicts MBTI personality types from users' recent 50 posts using NLP and ML techniques.
Freud16/f_backyard__Search_Engine
A search engine is constructed to return customised recipes according to three sorting algorithms. Speed is improved by performing pre-processing and inverted index.
mudittt/text-summarizer
It is an end-to-end text summarizer application, which uses Meta's BART model and is fine-tuned on the Samsung dataset.