text-processing

There are 1476 repositories under text-processing topic.

learnbyexample/Command-line-text-processing
:zap: From finding text to search and replace, from sorting to beautifying text and more :art:
Language:Shell10.1k 280 16716
google/diff-match-patch
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Language:Python7.2k 117 1091.1k
chmln/sd
Intuitive find & replace CLI (sed alternative)
Language:Rust5.5k 27 164135
pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Language:Python4.4k 56 1.8k428
fastnlp/fastNLP
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Language:Python3k 82 216451
pyparsing/pyparsing
Python library for creating PEG parsers
Language:Python2.1k 28 339275
kk7nc/Text_Classification
Text Classification Algorithms: A Survey
Language:Python1.8k 74 7546
birchb1024/frangipanni
Program to convert lines of text into a tree structure.
Language:Go1.2k 12 1931
roshan-research/hazm
Persian NLP Toolkit
Language:Python1.1k 23 229179
pemistahl/lingua-go
The most accurate natural language detection library for Go, suitable for short text and mixed-language text
Language:Go1.1k 11 3464
BurntSushi/aho-corasick
A fast implementation of Aho-Corasick in Rust.
Language:Rust968 19 6791
helix-editor/nucleo
A fast and convenient fuzzy matcher library for rust
Language:Rust746 18 1725
sstadick/hck
A sharp cut(1) clone.
Language:Rust683 7 2818
cbaziotis/ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Language:Python660 18 2892
derek73/python-nameparser
A simple Python module for parsing human names into their individual components
Language:Python639 26 112104
abadojack/whatlanggo
Natural language detection library for Go
Language:Go631 16 1563
open-korean-text/open-korean-text
Open Korean Text Processor - An Open-source Korean Text Processor
Language:Scala600 53 4394
ChenghaoMou/text-dedup
All-in-one text de-duplication
Language:Python521 4 5466
proycon/pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Language:Python477 32 2567
linuxscout/pyarabic
pyarabic
Language:Python425 36 4884
andrewbihl/bsed
Simple SQL-like syntax on top of Perl text processing.
Language:Python408 9 515
airbnb/artificial-adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Language:Python393 19 657
wenet-e2e/WeTextProcessing
Text Normalization & Inverse Text Normalization
Language:Python385 10 9159
haven-jeon/PyKoSpacing
Automatic Korean word spacing with Python
Language:Python373 9 37115
BurntSushi/regex-automata
A low level regular expression library that uses deterministic finite automata.
Language:Rust353 8 1926
textpipe/textpipe
Textpipe: clean and extract metadata from text
Language:Python299 22 4027
ikegami-yukino/jaconv
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
Language:Python295 11 1224
gagolews/stringi
Fast and portable character string processing in R (with the Unicode ICU)
Language:C++294 21 48445
RandyPen/TextCluster
短文本聚类预处理模块 Short text cluster
Language:Python262 3 562
open-i18n/rust-unic
UNIC: Unicode and Internationalization Crates for Rust
Language:Rust234 17 9224
himkt/konoha
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
Language:Python217 7 4021
catatsuy/purl
Streamlining Text Processing
Language:Go207 2 15
larrykollar/Unix-Text-Processing
Recreated sources for the book "UNIX Text Processing," published in 1987.
Language:Roff204 10 310
aappleby/matcheroni
A minimalist single-header library for building pattern-matchers, lexers, and parsers.
Language:C++193 4 24
bytesparadise/libasciidoc
A Golang library for processing Asciidoc files.
Language:Go193 10 51722
textvec/textvec
Text vectorization tool to outperform TFIDF for classification tasks
Language:Python193 8 1025

text-processing

learnbyexample/Command-line-text-processing

google/diff-match-patch

chmln/sd

pymupdf/PyMuPDF

fastnlp/fastNLP

pyparsing/pyparsing

kk7nc/Text_Classification

birchb1024/frangipanni

roshan-research/hazm

pemistahl/lingua-go

BurntSushi/aho-corasick

helix-editor/nucleo

sstadick/hck

cbaziotis/ekphrasis

derek73/python-nameparser

abadojack/whatlanggo

open-korean-text/open-korean-text

ChenghaoMou/text-dedup

proycon/pynlpl

linuxscout/pyarabic

andrewbihl/bsed

airbnb/artificial-adversary

wenet-e2e/WeTextProcessing

haven-jeon/PyKoSpacing

BurntSushi/regex-automata

textpipe/textpipe

ikegami-yukino/jaconv

gagolews/stringi

RandyPen/TextCluster

open-i18n/rust-unic

himkt/konoha

catatsuy/purl

larrykollar/Unix-Text-Processing

aappleby/matcheroni

bytesparadise/libasciidoc

textvec/textvec