text-datasets

There are 20 repositories under text-datasets topic.

  • tblock/10kGNAD

    Ten Thousand German News Articles Dataset for Topic Classification

    Language:Python842415
  • textdata

    EmilHvitfeldt/textdata

    Download, parse, store, and load text datasets instead of storing it in packages

    Language:R7595013
  • noisemix/noisemix

    NoiseMix - data generation for natural language

    Language:Python41367
  • Webtrench

    nuhmanpk/Webtrench

    A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code

    Language:Python24285
  • The-Gupta/TED-Scraper

    Complete Web Scraping of TED.com for Metadata, Transcript, Audio, Video, Images using Parallel Programming

    Language:Jupyter Notebook11115
  • Hsankesara/The-Tweets-of-Wisdom

    A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.

    Language:Jupyter Notebook9202
  • Pogayo/Luo-News-Dataset

    This repo contains LUO corpus for Named Entity Recognition. The text comes from the news domain and was scrapped from Radio Ramogi.

  • ravexina/shakespeare-plays-dataset-scraper

    A bash script to scrap shakespeare works from shakespeare.mit.edu + Already scraped plays in txt format

    Language:Shell4103
  • SherinBK/Fake-Job-Posting

    Data analysis project on Fake job posting dataset using Machine Learning and NLP basics

    Language:Jupyter Notebook1100
  • YujiSODE/txtStat

    the interface for text character analysis.

    Language:JavaScript1201
  • geo-tp/Alpha-Project-Text-Archive

    Compilation of texts from WoW alphas and betas. Used by https://github.com/The-Alpha-Project/Text-Crawler-Website

    Language:HTML0000
  • Honolulu69/Schiz

    Neural Network aided diagnosis of Schizophrenia via patient-centered text Data

    Language:Python0100
  • Infinitode/DupliPy

    DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.

    Language:Python0200
  • chenyaofo/OpenLM-Awesome

    Awesomes for Open Source Large Language Models and Datasets.

  • exanova-y/von_neumann_dataset

    biographies, quotes and talk transcripts of John von Neumann

  • nevmenandr/nazirov-texts-dataset

    Датасет с текстами Р. Г. Назирова

  • Pogayo/ADH-EN_MT_Dataset

    Contains Adhola-English parallel sentences that can be used for Machine Translation.

    Language:Jupyter Notebook20
  • robinreni96/WordDetection-Data-Generator

    This python script will generate n pages of text with bbox and its ground truth labels. Also it supports various background colors, fonts etc. Additionally it can export the dataset as tfrecord

    Language:Python20
  • Job_Inja

    Saeidhoseinipour/Job_Inja

    Using a Python class called JobScraper, it receives job information from various APIs. The goal is to receive job information from various APIs and avoid network problems such as connection failure, timeout, or blocking

    Language:Jupyter Notebook