text-data
There are 56 repositories under text-data topic.
asyml/texar
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
microsoft/DialoGPT
Large-scale pretraining for dialogue
microsoft/GODEL
Large-scale pretrained models for goal-directed dialog
asyml/texar-pytorch
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
asyml/forte
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
thu-coai/cotk
Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation
LoLei/redditcleaner
Cleans Reddit Text Data :scroll: :broom:
trinker/textreadr
Tools to uniformly read in text data including semi-structured transcripts
trinker/textshape
Tools for reshaping text data
BALaka-18/rake_new2
A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.
PratikBarhate/question-classification
Question Classification for the dataset CogComp QC Dataset - [ http://cogcomp.org/Data/QA/QC/ ].
YaleDHLab/wordmap
Visualize large text collections with WebGL
carted/processing-text-data
Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).
PedroBarcha/old-books-dataset
Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
tayebiarasteh/retweet
How Will Your Tweet Be Received? Predicting theSentiment Polarity of Tweet Replies
tylerjthomas9/ScrapeSEC.jl
Scrape EDGAR filings from https://www.sec.gov/
Hsankesara/The-Tweets-of-Wisdom
A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.
mrchypark/gomSubtitleData
곰tv 자막 데이터 수집 코드
Allan-Cao/lol-voice-lines
Dataset of League of Legends Voice Lines
Ankit152/StackOverflow-Tag-Prediction
A machine learning model that predicts tags for a given question and body.
saghiles/dcc
Directional Co-clustering with a Conscience (DCC)
SignalN/parallelio
For reading from and writing to parallel data files in Python
ccubc/GlassdoorReviews
classifying employee reviews on glassdoor.com
jfjelstul/regular-expressions-tutorial
A tutorial on using regular expressions in R
PriyankaSett/predicting_instagram_likes
The aim of this work is to predict number of instagram likes. The text vectorization is done using TF-IDF Vectorizer.
bchryzal/Detecting-Generated-Scientific-Papers
Can you spot automatically generated scientific excerpts?
cauchi94/airbnb-customer-sentiment
Analysis of text data by extracting the main topics from airbnb dataset using Latent Dirichlet Allocation (LDA) and then Linear Regression to interpret the topics.
chandrashekhar1227-ML/Fake_news_content_detection_using_Sentence_Transformers
Rank 3/85 MachineHack
chandrashekhar1227-ML/Git_hub_bugs_prediction_using_Keras_BERT
Rank 16/98 MachineHack
sugatagh/Natural-Language-Processing-with-Disaster-Tweets
The objective of the project is to predict whether a particular tweet, of which the text (occasionally the keyword and the location as well) is provided, indicates a real disaster or not. We use various NLP techniques and classification models for this purpose and objectively compare these models by means of appropriate evaluation metric.
vraul92/NLP-on-Whatsapp-Group-Chat
Applying NLP techniques on WhatsApp text to gain insights.
Nexdata-AI/13-Modules-Entity-Name-Single-sentence-Annotation-Data
13-Modules-Entity-Name-Single-sentence-Annotation-Data
Nexdata-AI/13000000-Groups-Man-Machine-Conversation-Interactive-Text-Data
13000000-Groups-Man-Machine-Conversation-Interactive-Text-Data
Nexdata-AI/28237-Intent-type-single-sentence-annotation-data
28237-Intent-type-single-sentence-annotation-data
Nexdata-AI/80000-sets-Multi-domain-Customer-Service-Dialogue-Text-Data
80000-sets-Multi-domain-Customer-Service-Dialogue-Text-Data