text-data

There are 62 repositories under text-data topic.

  • microsoft/DialoGPT

    Large-scale pretraining for dialogue

    Language:Python2.4k5284349
  • asyml/texar

    Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

    Language:Python2.4k74159369
  • microsoft/GODEL

    Large-scale pretrained models for goal-directed dialog

    Language:Python8851832113
  • asyml/texar-pytorch

    Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

    Language:Python74624138113
  • asyml/forte

    Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/

    Language:Python2501841459
  • thu-coai/cotk

    Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation

    Language:Python128516826
  • LoLei/redditcleaner

    Cleans Reddit Text Data :scroll: :broom:

    Language:Python84302
  • trinker/textreadr

    Tools to uniformly read in text data including semi-structured transcripts

    Language:R768246
  • trinker/textshape

    Tools for reshaping text data

    Language:R538182
  • BALaka-18/rake_new2

    A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.

    Language:Python2932520
  • PratikBarhate/question-classification

    Question Classification for the dataset CogComp QC Dataset - [ http://cogcomp.org/Data/QA/QC/ ].

    Language:Python294312
  • YaleDHLab/wordmap

    Visualize large text collections with WebGL

    Language:JavaScript26555
  • carted/processing-text-data

    Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).

    Language:Python20316
  • PedroBarcha/old-books-dataset

    Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.

    Language:HTML14102
  • tylerjthomas9/ScrapeSEC.jl

    Scrape EDGAR filings from https://www.sec.gov/

    Language:Julia14150
  • tayebiarasteh/retweet

    How Will Your Tweet Be Received? Predicting theSentiment Polarity of Tweet Replies

    Language:Python11215
  • Hsankesara/The-Tweets-of-Wisdom

    A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.

    Language:Jupyter Notebook9102
  • mrchypark/gomSubtitleData

    곰tv 자막 데이터 수집 코드

    Language:R6316
  • FareedKhan-dev/NLP-1K-Stories-Dataset-Genres-100

    This repository hosts a diverse NLP dataset comprising 1,000 stories spanning 100 genres for comprehensive language understanding tasks.

  • XMU-Kuangnan-Fang-Team/SpecificLDA

    A Python package implementing the Directed LDA model for targeted extraction of specific topics from text data

    Language:Python4103
  • Allan-Cao/lol-voice-lines

    Dataset of League of Legends Voice Lines

    Language:Jupyter Notebook3100
  • Ankit152/StackOverflow-Tag-Prediction

    A machine learning model that predicts tags for a given question and body.

    Language:Jupyter Notebook320
  • PriyankaSett/predicting_instagram_likes

    The aim of this work is to predict number of instagram likes. The text vectorization is done using TF-IDF Vectorizer.

    Language:Jupyter Notebook3100
  • saghiles/dcc

    Directional Co-clustering with a Conscience (DCC)

    Language:R3100
  • SignalN/parallelio

    For reading from and writing to parallel data files in Python

    Language:Python3100
  • ccubc/GlassdoorReviews

    classifying employee reviews on glassdoor.com

    Language:Jupyter Notebook2101
  • jfjelstul/regular-expressions-tutorial

    A tutorial on using regular expressions in R

  • Mohampouraz/Persian-poetry

    A comprehensive repository of classical Persian poetry, curated from Ganjoor.net, designed for Natural Language Processing (NLP), machine learning applications, and literary research.

    Language:Python2
  • sevvalckc/Turkish-SAD

    Python script to perform sentiment analysis on Turkish text data using multiple pre-trained transformer models and list of Turkish Sentiment Analysis Datasets between 2012 to 2022.

    Language:Python20
  • sugatagh/Natural-Language-Processing-with-Disaster-Tweets

    The objective of the project is to predict whether a particular tweet, of which the text (occasionally the keyword and the location as well) is provided, indicates a real disaster or not. We use various NLP techniques and classification models for this purpose and objectively compare these models by means of appropriate evaluation metric.

    Language:Jupyter Notebook2101
  • bchryzal/Detecting-Generated-Scientific-Papers

    Can you spot automatically generated scientific excerpts?

    Language:Jupyter Notebook1100
  • cauchi94/airbnb-customer-sentiment

    Analysis of text data by extracting the main topics from airbnb dataset using Latent Dirichlet Allocation (LDA) and then Linear Regression to interpret the topics.

    Language:Jupyter Notebook1100
  • Infinitode/DupliPy

    DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.

    Language:Python1100
  • KlaraGtknst/text_topic

    This repository implements a pipeline to store various data of files from a large unstructured dataset. These fields are used for topic modeling (wordclouds, based on low-dimensional versions of embedding vectors, Named Entity Clustering and document-topic incidences). The information is aggregated and visualised using FCA.

    Language:Python110
  • ptthanh02/vietnam-news-crawler

    Python-based web scraping tool for extracting articles from VietNamNet

    Language:Jupyter Notebook1100
  • TZNcse209/Text-Data-Sentiment-Analysis

    Text Data: Sentiment Analysis

    Language:Jupyter Notebook1100