text-data

There are 62 repositories under text-data topic.

microsoft/DialoGPT
Large-scale pretraining for dialogue
Language:Python2.4k 52 84349
asyml/texar
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Language:Python2.4k 74 159369
microsoft/GODEL
Large-scale pretrained models for goal-directed dialog
Language:Python885 18 32113
asyml/texar-pytorch
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Language:Python746 24 138113
asyml/forte
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
Language:Python250 18 41459
thu-coai/cotk
Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation
Language:Python128 5 16826
LoLei/redditcleaner
Cleans Reddit Text Data :scroll: :broom:
Language:Python84 3 02
trinker/textreadr
Tools to uniformly read in text data including semi-structured transcripts
Language:R76 8 246
trinker/textshape
Tools for reshaping text data
Language:R53 8 182
BALaka-18/rake_new2
A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.
Language:Python29 3 2520
PratikBarhate/question-classification
Question Classification for the dataset CogComp QC Dataset - [ http://cogcomp.org/Data/QA/QC/ ].
Language:Python29 4 312
YaleDHLab/wordmap
Visualize large text collections with WebGL
Language:JavaScript26 5 55
carted/processing-text-data
Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).
Language:Python20 3 16
PedroBarcha/old-books-dataset
Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
Language:HTML14 1 02
tylerjthomas9/ScrapeSEC.jl
Scrape EDGAR filings from https://www.sec.gov/
Language:Julia14 1 50
tayebiarasteh/retweet
How Will Your Tweet Be Received? Predicting theSentiment Polarity of Tweet Replies
Language:Python11 2 15
Hsankesara/The-Tweets-of-Wisdom
A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.
Language:Jupyter Notebook9 1 02
mrchypark/gomSubtitleData
곰tv 자막 데이터 수집 코드
Language:R6 3 16
FareedKhan-dev/NLP-1K-Stories-Dataset-Genres-100
This repository hosts a diverse NLP dataset comprising 1,000 stories spanning 100 genres for comprehensive language understanding tasks.
4 2 1
XMU-Kuangnan-Fang-Team/SpecificLDA
A Python package implementing the Directed LDA model for targeted extraction of specific topics from text data
Language:Python4 1 03
Allan-Cao/lol-voice-lines
Dataset of League of Legends Voice Lines
Language:Jupyter Notebook3 1 00
Ankit152/StackOverflow-Tag-Prediction
A machine learning model that predicts tags for a given question and body.
Language:Jupyter Notebook3 2 0
PriyankaSett/predicting_instagram_likes
The aim of this work is to predict number of instagram likes. The text vectorization is done using TF-IDF Vectorizer.
Language:Jupyter Notebook3 1 00
saghiles/dcc
Directional Co-clustering with a Conscience (DCC)
Language:R3 1 00
SignalN/parallelio
For reading from and writing to parallel data files in Python
Language:Python3 1 00
ccubc/GlassdoorReviews
classifying employee reviews on glassdoor.com
Language:Jupyter Notebook2 1 01
jfjelstul/regular-expressions-tutorial
A tutorial on using regular expressions in R
2 1 00
Mohampouraz/Persian-poetry
A comprehensive repository of classical Persian poetry, curated from Ganjoor.net, designed for Natural Language Processing (NLP), machine learning applications, and literary research.
Language:Python2
sevvalckc/Turkish-SAD
Python script to perform sentiment analysis on Turkish text data using multiple pre-trained transformer models and list of Turkish Sentiment Analysis Datasets between 2012 to 2022.
Language:Python20
sugatagh/Natural-Language-Processing-with-Disaster-Tweets
The objective of the project is to predict whether a particular tweet, of which the text (occasionally the keyword and the location as well) is provided, indicates a real disaster or not. We use various NLP techniques and classification models for this purpose and objectively compare these models by means of appropriate evaluation metric.
Language:Jupyter Notebook2 1 01
bchryzal/Detecting-Generated-Scientific-Papers
Can you spot automatically generated scientific excerpts?
Language:Jupyter Notebook1 1 00
cauchi94/airbnb-customer-sentiment
Analysis of text data by extracting the main topics from airbnb dataset using Latent Dirichlet Allocation (LDA) and then Linear Regression to interpret the topics.
Language:Jupyter Notebook1 1 00
Infinitode/DupliPy
DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.
Language:Python1 1 00
KlaraGtknst/text_topic
This repository implements a pipeline to store various data of files from a large unstructured dataset. These fields are used for topic modeling (wordclouds, based on low-dimensional versions of embedding vectors, Named Entity Clustering and document-topic incidences). The information is aggregated and visualised using FCA.
Language:Python1 1 0
ptthanh02/vietnam-news-crawler
Python-based web scraping tool for extracting articles from VietNamNet
Language:Jupyter Notebook1 1 00
TZNcse209/Text-Data-Sentiment-Analysis
Text Data: Sentiment Analysis
Language:Jupyter Notebook1 1 00

text-data

microsoft/DialoGPT

asyml/texar

microsoft/GODEL

asyml/texar-pytorch

asyml/forte

thu-coai/cotk

LoLei/redditcleaner

trinker/textreadr

trinker/textshape

BALaka-18/rake_new2

PratikBarhate/question-classification

YaleDHLab/wordmap

carted/processing-text-data

PedroBarcha/old-books-dataset

tylerjthomas9/ScrapeSEC.jl

tayebiarasteh/retweet

Hsankesara/The-Tweets-of-Wisdom

mrchypark/gomSubtitleData

FareedKhan-dev/NLP-1K-Stories-Dataset-Genres-100

XMU-Kuangnan-Fang-Team/SpecificLDA

Allan-Cao/lol-voice-lines

Ankit152/StackOverflow-Tag-Prediction

PriyankaSett/predicting_instagram_likes

saghiles/dcc

SignalN/parallelio

ccubc/GlassdoorReviews

jfjelstul/regular-expressions-tutorial

Mohampouraz/Persian-poetry

sevvalckc/Turkish-SAD

sugatagh/Natural-Language-Processing-with-Disaster-Tweets

bchryzal/Detecting-Generated-Scientific-Papers

cauchi94/airbnb-customer-sentiment

Infinitode/DupliPy

KlaraGtknst/text_topic

ptthanh02/vietnam-news-crawler

TZNcse209/Text-Data-Sentiment-Analysis