sentence-tokenizer

There are 40 repositories under sentence-tokenizer topic.

nipunsadvilkar/pySBD
🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
Language:Python821 12 7484
neurosnap/sentences
A multilingual command line sentence tokenizer in Golang
Language:Go442 14 1739
vngrs-ai/vnlp
State-of-the-art, lightweight NLP tools for Turkish language. Developed by VNGRS.
Language:Python257 9 217
megagonlabs/bunkai
Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)
Language:Python186 5 311
lfcipriani/punkt-segmenter
Ruby port of the NLTK Punkt sentence segmentation algorithm
Language:Ruby92 2 510
cbilgili/zemberek-nlp-server
Zemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu
Language:Java75 5 417
wwwcojp/ja_sentence_segmenter
japanese sentence segmentation library for python
Language:Python68 1 12
brandonrobertz/sentence-autosegmentation
Deep-learning based sentence auto-segmentation from unstructured text w/o punctuation
Language:Python37 5 310
Flight-School/sentences
A command-line utility that splits natural language text into sentences.
Language:Swift37 1 0
ikegami-yukino/sengiri
Yet another sentence-level tokenizer for the Japanese text
Language:Python22 4 25
apdullahyayik/TrTokenizer
🧩 A simple sentence tokenizer.
Language:Python20 3 01
lord-alfred/dnlp
📚 Сборник полезных штук из Natural Language Processing: Определение языка текста, Разделение текста на предложения, Получение основного содержимого из html документа
Language:Python19 4 05
gosbd/gosbd
A sentence splitting (sentence boundary disambiguation) library for Go. It is rule-based and works out-of-the-box.
Language:Go18 1 134
Foysal87/bn_nlp
Bangla NLP toolkit.
Language:Python12 2 210
bhattbhavesh91/sentence-transformers-example
HuggingFace's Transformer models for sentence / text embedding generation.
Language:Jupyter Notebook8 3 05
KMiNT21/html2sent
HTML2SENT modifies HTML to improve sentences tokenizer quality
Language:Python8 1 02
StarlangSoftware/Corpus
Corpus processing library
Language:Java6 4 24
mkartawijaya/hasami
A tool to perform sentence segmentation on Japanese text
Language:Python5 1 00
morteza89/NLP-LLM
Language:Jupyter Notebook5 1 02
StarlangSoftware/Corpus-CPP
Corpus processing library
Language:C++4 1 01
StarlangSoftware/Corpus-Py
Corpus processing library
Language:Python3 1 08
sichkar-valentyn/Machine_Learning_in_Python
Practical experiments on Machine Learning in Python. Processing of sentences and finding relevant ones, approximation of function with polynomials, function optimization
Language:Python2 1 03
StarlangSoftware/Corpus-Swift
Corpus processing library
Language:Swift2 1 01
deepakrana47/Sentence_tokenizer
Consist of Neural Network based sentence Tokenizer
Language:Python1 2 01
Musaddiq625/Python-Projects
Some of my Python Projects
Language:Python1 1 01
rmjacobson/privacy-crawler-parser-tokenizer
Crawler, Parser, Sentence Tokenizer for online privacy policies. Intended to support ML efforts on policy language and verification.
Language:HTML1 1 01
StarlangSoftware/Corpus-Js
Corpus Processing Library
Language:TypeScript1 1 0
victoryosiobe/kingchop
Kingchop ⚔️ is a JavaScript English based library for tokenizing text (chopping text). It uses vast rules for tokenizing, and you can adjust them easily.
Language:JavaScript1 1 00
Aburraq/StanfordCoreNLP
My legal background gave me a deep appreciation for language's importance. It's not just words; it's a profound understanding woven into every case. This connection led me to coding, where I coded a potent pipeline system with Stanford CoreNLP.
Language:Java0 1 00
coderganesh/tamil-sentence-tokenizer
A sentence tokenizer NLP tool for the Tamil language
Language:Python0 1 00
faisaltareque/Multilingual-Sentence-Tokenizer
This Python package is designed for tokenizing sentences in over 40 languages. It serves as a wrapper around various open-source libraries. The package was created to support our work XL-HeadTags. To use it, simply provide the word and its corresponding language to the stemmer, and it will return the stemmed version of the word.
Language:Python00
nature-of-eu-rules/data-preprocessing
Document preprocessing scripts for the Nature of EU Rules project
Language:Python0 2 00
elifftosunn/textDataClean
Kirli veri çekildiğinde ön işleme adımlarına gerek kalmadan model eğitimi için hazır hale getirmek amacıyla yapılan uygulamadır.
2 0
quocthang0507/VietnameseNaturalLanguageProcessing
Vietnamese Natural Language Processing
Language:Jupyter Notebook2 0
StarlangSoftware/Corpus-Cy
Corpus Processing Library
Language:Cython1 0
zainmujahid/Longest-Common-Subsequence
This repository contains python script for calculating Longest Common Subsequences (LSC) between tokenized URDU sentences.
Language:Jupyter Notebook1 0

sentence-tokenizer

nipunsadvilkar/pySBD

neurosnap/sentences

vngrs-ai/vnlp

megagonlabs/bunkai

lfcipriani/punkt-segmenter

cbilgili/zemberek-nlp-server

wwwcojp/ja_sentence_segmenter

brandonrobertz/sentence-autosegmentation

Flight-School/sentences

ikegami-yukino/sengiri

apdullahyayik/TrTokenizer

lord-alfred/dnlp

gosbd/gosbd

Foysal87/bn_nlp

bhattbhavesh91/sentence-transformers-example

KMiNT21/html2sent

StarlangSoftware/Corpus

mkartawijaya/hasami

morteza89/NLP-LLM

StarlangSoftware/Corpus-CPP

StarlangSoftware/Corpus-Py

sichkar-valentyn/Machine_Learning_in_Python

StarlangSoftware/Corpus-Swift

deepakrana47/Sentence_tokenizer

Musaddiq625/Python-Projects

rmjacobson/privacy-crawler-parser-tokenizer

StarlangSoftware/Corpus-Js

victoryosiobe/kingchop

Aburraq/StanfordCoreNLP

coderganesh/tamil-sentence-tokenizer

faisaltareque/Multilingual-Sentence-Tokenizer

nature-of-eu-rules/data-preprocessing

elifftosunn/textDataClean

quocthang0507/VietnameseNaturalLanguageProcessing

StarlangSoftware/Corpus-Cy

zainmujahid/Longest-Common-Subsequence