text-cleaning
There are 78 repositories under text-cleaning topic.
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
blmoistawinde/HarvestText
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
jfilter/clean-text
🧹 Python package for text cleaning
wisupai/e2m
E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2M offers an all-in-one, flexible, and open-source solution.
currentslab/extractnet
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
trinker/textclean
Tools for cleaning and normalizing text data
reZach/grammarify
Grammarify is a npm package that safely cleans up text that has mispellings, improper capitalization, lexical illusions, among other things.
hscspring/pnlp
NLP预/后处理工具。
sharejing/Takin
A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。
amansrivastava17/text-preprocess-python
Text preprocessing tools in python.
dataiku/dss-plugin-nlp-preparation
Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼
Ankur3107/nlp_preprocessing
Text Preprocessing Package includes cleaning, tokenization, dataset preparation ...etc
Aayushpatel007/topicrankpy
A Python package to get useful information from documents using TopicRank Algorithm.
YongWookHa/kor-text-preprocess
Korean text data preprocess toolkit for NLP
alinapetukhova/textcl
Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/
ecomp-shONgit/text-normalisation
JS / Python3 / PHP Lib to work with UTF8 polytonic greek and latin
ilos-vigil/scl-2020-product-detection
4th place (top 1%) solution for Shopee Code League 2020 - Product Detection
showmik/TidyText
🖹 Offline Text Cleaner and Formatter
fernandosola/textpp-ptbr
Common Text Pre-Processing for Portuguese
johnjago/deformat
Remove extra whitespace from text.
sagepublishing/text_cleaning
Corpora and scripts for cleaning political science texts. Scripts are translated into transformations that support SAGE Texti.
AndyTheFactory/article-extraction-dataset
Article title, authors, date and body extraction dataset.
krisograbek/text-preprocessing
Text preprocessing in Python. Libs include string, re, nltk, spacy, gensim, textblob, unidecode, autocorrect, pyspellchecker
lprtk/nlp-amazon-customer-reviews
Sentiment analysis, text mining, topic modeling & sentiment prediction
Abhayparashar31/crazytext
A Simple Easy To Use Text Cleaning Package For NLP Built In Python. It Can Clean and Analyze Your Text Data In One Line of Code.
HandcartCactus/obsidian-remove-newlines
A plugin for Obsidian.md which removes newlines and blank lines from selected or pasted text.
ilos-vigil/indonesian-document-clustering
Indonesian News and Article Clustering with K-Means++
cwwdaniel/invoice-text-classification
Semantic Enrichment, Data Augmentation and Deep Learning for Boosting Invoice Text Classification Performance: A Novel Natural Language Processing Strategy
jradha11/sentiment-analysis-nlp
Sentiment Analysis of Restaurant Reviews using NLP
mim-solutions/mim_nlp
A Python package with ready-to-use models for various NLP tasks and text preprocessing utilities. The implementation allows fine-tuning.
rhnfzl/SqueakyCleanText
Clean your Text for Statistical ML and Language Model
1994nikunj/nlp-toolkit-desktop-app
The code is a collection of NLP analyses, including text cleaning, most common words, n-grams generation, co-occurrence matrix generation, wordcloud generation, topic modeling (using Latent Dirichlet Allocation), and general text statistics.
ternaus/ternaus-cleantext
Cleans text as in the CLIP model
umapornp/textprepro
👀 Everything Everyway All At Once Text Preprocessing for Natural Language Processing.
vflawson/Text-Cleaning-for-NLP-in-Python
Python-based text cleaning of comments scraped from social media platforms for NLP-based brand sentiment analysis
YashSDholam/Tripadvisor-Hotel-Review-Sentiment-Analysis-using-LSTM-Neural-Network
In this project, I utilized the TripAdvisor Hotel Review dataset from Kaggle to perform sentiment analysis on hotel reviews. The main objective was to build a predictive model using LSTM (Long Short-Term Memory) neural networks to classify hotel reviews as positive or negative based on their textual content.