text-data
There are 62 repositories under text-data topic.
microsoft/DialoGPT
Large-scale pretraining for dialogue
asyml/texar
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
microsoft/GODEL
Large-scale pretrained models for goal-directed dialog
asyml/texar-pytorch
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
asyml/forte
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
thu-coai/cotk
Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation
LoLei/redditcleaner
Cleans Reddit Text Data :scroll: :broom:
trinker/textreadr
Tools to uniformly read in text data including semi-structured transcripts
trinker/textshape
Tools for reshaping text data
BALaka-18/rake_new2
A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.
PratikBarhate/question-classification
Question Classification for the dataset CogComp QC Dataset - [ http://cogcomp.org/Data/QA/QC/ ].
YaleDHLab/wordmap
Visualize large text collections with WebGL
carted/processing-text-data
Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).
tylerjthomas9/ScrapeSEC.jl
Scrape EDGAR filings from https://www.sec.gov/
PedroBarcha/old-books-dataset
Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
tayebiarasteh/retweet
How Will Your Tweet Be Received? Predicting theSentiment Polarity of Tweet Replies
Hsankesara/The-Tweets-of-Wisdom
A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.
mrchypark/gomSubtitleData
곰tv 자막 데이터 수집 코드
FareedKhan-dev/NLP-1K-Stories-Dataset-Genres-100
This repository hosts a diverse NLP dataset comprising 1,000 stories spanning 100 genres for comprehensive language understanding tasks.
XMU-Kuangnan-Fang-Team/SpecificLDA
A Python package implementing the Directed LDA model for targeted extraction of specific topics from text data
Allan-Cao/lol-voice-lines
Dataset of League of Legends Voice Lines
Ankit152/StackOverflow-Tag-Prediction
A machine learning model that predicts tags for a given question and body.
PriyankaSett/predicting_instagram_likes
The aim of this work is to predict number of instagram likes. The text vectorization is done using TF-IDF Vectorizer.
saghiles/dcc
Directional Co-clustering with a Conscience (DCC)
SignalN/parallelio
For reading from and writing to parallel data files in Python
ccubc/GlassdoorReviews
classifying employee reviews on glassdoor.com
jfjelstul/regular-expressions-tutorial
A tutorial on using regular expressions in R
sevvalckc/Turkish-SAD
Python script to perform sentiment analysis on Turkish text data using multiple pre-trained transformer models and list of Turkish Sentiment Analysis Datasets between 2012 to 2022.
sugatagh/Natural-Language-Processing-with-Disaster-Tweets
The objective of the project is to predict whether a particular tweet, of which the text (occasionally the keyword and the location as well) is provided, indicates a real disaster or not. We use various NLP techniques and classification models for this purpose and objectively compare these models by means of appropriate evaluation metric.
bchryzal/Detecting-Generated-Scientific-Papers
Can you spot automatically generated scientific excerpts?
cauchi94/airbnb-customer-sentiment
Analysis of text data by extracting the main topics from airbnb dataset using Latent Dirichlet Allocation (LDA) and then Linear Regression to interpret the topics.
chandrashekhar1227-ML/Git_hub_bugs_prediction_using_Keras_BERT
Rank 16/98 MachineHack
Infinitode/DupliPy
DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.
KlaraGtknst/text_topic
This repository implements a pipeline to store various data of files from a large unstructured dataset. These fields are used for topic modeling (wordclouds, based on low-dimensional versions of embedding vectors, Named Entity Clustering and document-topic incidences). The information is aggregated and visualised using FCA.
ptthanh02/vietnam-news-crawler
Python-based web scraping tool for extracting articles from VietNamNet
TZNcse209/Text-Data-Sentiment-Analysis
Text Data: Sentiment Analysis