sentence-segmentation
There are 55 repositories under sentence-segmentation topic.
undertheseanlp/underthesea
Underthesea - Vietnamese NLP Toolkit
natasha/natasha
Solves basic Russian NLP tasks, API for lower level Natasha projects
segment-any-text/wtpsplit
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
nlp-uoregon/trankit
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
vncorenlp/VnCoreNLP
A Vietnamese natural language processing toolkit (NAACL 2018)
bitextor/bitextor
Bitextor generates translation memories from multilingual websites
natasha/razdel
Rule-based token, sentence segmentation for Russian language
milaan9/Python_Natural_Language_Processing
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.
ckiplab/ckipnlp
CKIP CoreNLP Toolkits
PKU-TANGENT/NeuralEDUSeg
A toolkit for discourse segmentation (EDU segmentation).
neelkamath/spacy-server
🦜 Containerized HTTP API for industrial-strength NLP via spaCy and sense2vec
wikimedia/sentencex
A sentence segmentation library with wide language support optimized for speed and utility.
UglyToad/PragmaticSegmenterNet
Port of PragmaticSegmenter for sentence boundary detection
zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules
hellonlp/hellonlp
NLP tools, word segmentation, sentence segmentation, New-Word-Discovery,新词发现
gosbd/gosbd
A sentence splitting (sentence boundary disambiguation) library for Go. It is rule-based and works out-of-the-box.
mtreviso/deepbond
Deep neural approach to Boundary and Disfluency Detection - Based on my Master's work
bureaucratic-labs/models
Pre-trained models for tokenization, sentence segmentation and so on
tc64/spacyss
Sentence Segmentation for Spacy
KMiNT21/html2sent
HTML2SENT modifies HTML to improve sentences tokenizer quality
StarlangSoftware/Corpus
Corpus processing library
mkartawijaya/hasami
A tool to perform sentence segmentation on Japanese text
undertheseanlp/sent_tokenize
Vietnamese Sentence Boundary Detection
amorgun/russian-nlp-pretrained-models
Pre-trained models for tokenization, sentence segmentation and so on
seanghay/khmerpunctuate
Punctuation Restoration for Khmer language
StarlangSoftware/Corpus-CPP
Corpus processing library
mbanon/benchmarks
Several benchmarks on sentence splitting and language identification
StarlangSoftware/Corpus-Py
Corpus processing library
Michael95-m/mya-sent-break
Sentence segmentation for burmese language by rule-based method
mkranzlein/curiam-segmenter
Sentence segmenter for legal texts
StarlangSoftware/Corpus-Swift
Corpus processing library
eaklykova/syntaxcomp
A Python3 package for extracting syntactic complexity measures from CoNLL-U annotations.
maggieezzat/Covid19-Semantic-based-Search
Semantic-based search using word embedding to help the medical community develop answers to high priority scientific questions using Kaggle's CORD-19 dataset. This repository is part of Kaggle's CORD-19 challenge: https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge
minseok0809/korean-sentence-segementation
AIHub 한국어 데이터 전처리: 한국어 문장 분리
StarlangSoftware/Corpus-Js
Corpus Processing Library
wikimedia/sentencex-go
A sentence segmentation library with wide language support optimized for speed and utility.