text-processing
There are 1476 repositories under text-processing topic.
learnbyexample/Command-line-text-processing
:zap: From finding text to search and replace, from sorting to beautifying text and more :art:
google/diff-match-patch
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
chmln/sd
Intuitive find & replace CLI (sed alternative)
pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
fastnlp/fastNLP
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
pyparsing/pyparsing
Python library for creating PEG parsers
kk7nc/Text_Classification
Text Classification Algorithms: A Survey
birchb1024/frangipanni
Program to convert lines of text into a tree structure.
roshan-research/hazm
Persian NLP Toolkit
pemistahl/lingua-go
The most accurate natural language detection library for Go, suitable for short text and mixed-language text
BurntSushi/aho-corasick
A fast implementation of Aho-Corasick in Rust.
helix-editor/nucleo
A fast and convenient fuzzy matcher library for rust
sstadick/hck
A sharp cut(1) clone.
cbaziotis/ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
derek73/python-nameparser
A simple Python module for parsing human names into their individual components
abadojack/whatlanggo
Natural language detection library for Go
open-korean-text/open-korean-text
Open Korean Text Processor - An Open-source Korean Text Processor
ChenghaoMou/text-dedup
All-in-one text de-duplication
proycon/pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
linuxscout/pyarabic
pyarabic
andrewbihl/bsed
Simple SQL-like syntax on top of Perl text processing.
airbnb/artificial-adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
wenet-e2e/WeTextProcessing
Text Normalization & Inverse Text Normalization
haven-jeon/PyKoSpacing
Automatic Korean word spacing with Python
BurntSushi/regex-automata
A low level regular expression library that uses deterministic finite automata.
textpipe/textpipe
Textpipe: clean and extract metadata from text
ikegami-yukino/jaconv
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
gagolews/stringi
Fast and portable character string processing in R (with the Unicode ICU)
RandyPen/TextCluster
短文本聚类预处理模块 Short text cluster
open-i18n/rust-unic
UNIC: Unicode and Internationalization Crates for Rust
himkt/konoha
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
catatsuy/purl
Streamlining Text Processing
larrykollar/Unix-Text-Processing
Recreated sources for the book "UNIX Text Processing," published in 1987.
aappleby/matcheroni
A minimalist single-header library for building pattern-matchers, lexers, and parsers.
bytesparadise/libasciidoc
A Golang library for processing Asciidoc files.
textvec/textvec
Text vectorization tool to outperform TFIDF for classification tasks