sentence-tokenizer
There are 40 repositories under sentence-tokenizer topic.
nipunsadvilkar/pySBD
🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
neurosnap/sentences
A multilingual command line sentence tokenizer in Golang
vngrs-ai/vnlp
State-of-the-art, lightweight NLP tools for Turkish language. Developed by VNGRS.
megagonlabs/bunkai
Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)
lfcipriani/punkt-segmenter
Ruby port of the NLTK Punkt sentence segmentation algorithm
cbilgili/zemberek-nlp-server
Zemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu
wwwcojp/ja_sentence_segmenter
japanese sentence segmentation library for python
brandonrobertz/sentence-autosegmentation
Deep-learning based sentence auto-segmentation from unstructured text w/o punctuation
Flight-School/sentences
A command-line utility that splits natural language text into sentences.
ikegami-yukino/sengiri
Yet another sentence-level tokenizer for the Japanese text
apdullahyayik/TrTokenizer
🧩 A simple sentence tokenizer.
lord-alfred/dnlp
📚 Сборник полезных штук из Natural Language Processing: Определение языка текста, Разделение текста на предложения, Получение основного содержимого из html документа
gosbd/gosbd
A sentence splitting (sentence boundary disambiguation) library for Go. It is rule-based and works out-of-the-box.
Foysal87/bn_nlp
Bangla NLP toolkit.
bhattbhavesh91/sentence-transformers-example
HuggingFace's Transformer models for sentence / text embedding generation.
KMiNT21/html2sent
HTML2SENT modifies HTML to improve sentences tokenizer quality
StarlangSoftware/Corpus
Corpus processing library
mkartawijaya/hasami
A tool to perform sentence segmentation on Japanese text
StarlangSoftware/Corpus-CPP
Corpus processing library
StarlangSoftware/Corpus-Py
Corpus processing library
sichkar-valentyn/Machine_Learning_in_Python
Practical experiments on Machine Learning in Python. Processing of sentences and finding relevant ones, approximation of function with polynomials, function optimization
deepakrana47/Sentence_tokenizer
Consist of Neural Network based sentence Tokenizer
Musaddiq625/Python-Projects
Some of my Python Projects
rmjacobson/privacy-crawler-parser-tokenizer
Crawler, Parser, Sentence Tokenizer for online privacy policies. Intended to support ML efforts on policy language and verification.
StarlangSoftware/Corpus-Js
Corpus Processing Library
victoryosiobe/kingchop
Kingchop ⚔️ is a JavaScript English based library for tokenizing text (chopping text). It uses vast rules for tokenizing, and you can adjust them easily.
Aburraq/StanfordCoreNLP
My legal background gave me a deep appreciation for language's importance. It's not just words; it's a profound understanding woven into every case. This connection led me to coding, where I coded a potent pipeline system with Stanford CoreNLP.
coderganesh/tamil-sentence-tokenizer
A sentence tokenizer NLP tool for the Tamil language
faisaltareque/Multilingual-Sentence-Tokenizer
This Python package is designed for tokenizing sentences in over 40 languages. It serves as a wrapper around various open-source libraries. The package was created to support our work XL-HeadTags. To use it, simply provide the word and its corresponding language to the stemmer, and it will return the stemmed version of the word.
nature-of-eu-rules/data-preprocessing
Document preprocessing scripts for the Nature of EU Rules project
elifftosunn/textDataClean
Kirli veri çekildiğinde ön işleme adımlarına gerek kalmadan model eğitimi için hazır hale getirmek amacıyla yapılan uygulamadır.
quocthang0507/VietnameseNaturalLanguageProcessing
Vietnamese Natural Language Processing
StarlangSoftware/Corpus-Cy
Corpus Processing Library
StarlangSoftware/Corpus-Swift
Corpus processing library
zainmujahid/Longest-Common-Subsequence
This repository contains python script for calculating Longest Common Subsequences (LSC) between tokenized URDU sentences.