text-mining
There are 2422 repositories under text-mining topic.
keon/awesome-nlp
:book: A curated list of resources dedicated to Natural Language Processing (NLP)
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
deanmalmgren/textract
extract text from any document. no muss. no fuss.
jbesomi/texthero
Text preprocessing, representation and visualization from zero to hero.
JasonKessler/scattertext
Beautiful visualizations of how language differs among document types.
chiphuyen/lazynlp
Library to scrape and clean web pages to create massive datasets.
ujjwalkarn/DataScienceR
a curated list of R tutorials for Data Science, NLP and Machine Learning
mathsyouth/awesome-text-summarization
A curated list of resources dedicated to text summarization
konlpy/konlpy
Python package for Korean natural language processing.
juliasilge/tidy-text-mining
Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson
DemonDamon/FinnewsHunter
从新浪财经、每经网、金融界、**证券网、证券时报网上,爬取上市公司(个股)的历史新闻文本数据进行文本分析、提取特征集,然后利用SVM、随机森林等分类器进行训练,最后对实施抓取的新闻数据进行分类预测
shangjingbo1226/AutoPhrase
AutoPhrase: Automated Phrase Mining from Massive Text Corpora
juliasilge/tidytext
Text mining using tidy tools :sparkles::page_facing_up::sparkles:
kavgan/nlp-in-practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
opensemanticsearch/open-semantic-search
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
csurfer/rake-nltk
Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
nlptown/nlp-notebooks
A collection of notebooks for Natural Language Processing from NLP Town
gsh199449/spider
A configurable web spider with a easy-to-use web console
dselivanov/text2vec
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
gesiscss/awesome-computational-social-science
A list of awesome resources for Computational Social Science
bigartm/bigartm
Fast topic modeling platform
graphbrain/graphbrain
Language, Knowledge, Cognition
nishitpatel01/Fake_News_Detection
Fake News Detection in Python
stepthom/text_mining_resources
Resources for learning about Text Mining and Natural Language Processing
cpsievert/LDAvis
R package for web-based interactive topic model visualization.
laugustyniak/awesome-sentiment-analysis
Repository with all what is necessary for sentiment analysis and related areas
adbar/German-NLP
Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
stephenhky/PyShortTextCategorization
Various Algorithms for Short Text Mining
kk7nc/RMDL
RMDL: Random Multimodel Deep Learning for Classification
bakrianoo/aravec
AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.
airbnb/artificial-adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
caufieldjh/awesome-bioie
🧫 A curated list of resources relevant to doing Biomedical Information Extraction (including BioNLP)
hiDaDeng/cntext
text analysis, supporting multiple methods including word count, readability, document similarity, sentiment analysis, Word2Vec/GloVe, and Large Language Models (LLMs).文本分析包,支持字数统计、可读性、文档相似度、情感分析在内的多种文本分析方法。
jmartinezheras/2018-MachineLearning-Lectures-ESA
Machine Learning Lectures at the European Space Agency (ESA) in 2018
sergioburdisso/pyss3
A Python library for Interpretable Machine Learning in Text Classification using the SS3 model, with easy-to-use visualization tools for Explainable AI :octocat:
lining0806/TextMining
Python文本挖掘系统 Research of Text Mining System