text-datasets
There are 20 repositories under text-datasets topic.
tblock/10kGNAD
Ten Thousand German News Articles Dataset for Topic Classification
EmilHvitfeldt/textdata
Download, parse, store, and load text datasets instead of storing it in packages
noisemix/noisemix
NoiseMix - data generation for natural language
nuhmanpk/Webtrench
A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code
The-Gupta/TED-Scraper
Complete Web Scraping of TED.com for Metadata, Transcript, Audio, Video, Images using Parallel Programming
Hsankesara/The-Tweets-of-Wisdom
A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.
Pogayo/Luo-News-Dataset
This repo contains LUO corpus for Named Entity Recognition. The text comes from the news domain and was scrapped from Radio Ramogi.
ravexina/shakespeare-plays-dataset-scraper
A bash script to scrap shakespeare works from shakespeare.mit.edu + Already scraped plays in txt format
SherinBK/Fake-Job-Posting
Data analysis project on Fake job posting dataset using Machine Learning and NLP basics
YujiSODE/txtStat
the interface for text character analysis.
geo-tp/Alpha-Project-Text-Archive
Compilation of texts from WoW alphas and betas. Used by https://github.com/The-Alpha-Project/Text-Crawler-Website
Honolulu69/Schiz
Neural Network aided diagnosis of Schizophrenia via patient-centered text Data
Infinitode/DupliPy
DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.
chenyaofo/OpenLM-Awesome
Awesomes for Open Source Large Language Models and Datasets.
exanova-y/von_neumann_dataset
biographies, quotes and talk transcripts of John von Neumann
nevmenandr/nazirov-texts-dataset
Датасет с текстами Р. Г. Назирова
Pogayo/ADH-EN_MT_Dataset
Contains Adhola-English parallel sentences that can be used for Machine Translation.
robinreni96/WordDetection-Data-Generator
This python script will generate n pages of text with bbox and its ground truth labels. Also it supports various background colors, fonts etc. Additionally it can export the dataset as tfrecord
Saeidhoseinipour/Job_Inja
Using a Python class called JobScraper, it receives job information from various APIs. The goal is to receive job information from various APIs and avoid network problems such as connection failure, timeout, or blocking