corpus-builder
There are 18 repositories under corpus-builder topic.
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
google/corpuscrawler
Crawler for linguistic corpora
praaline/Praaline
Praaline is an open-source system to manage, annotate, visualise and analyse spoken language corpora
carlfm01/librivox-tools
Collector and speech cutter for librivox audiobooks
dohliam/ebook-corpus
Ebook Corpus - A parser and extractor for electronic books
thecsw/katya-dev
Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!
AndyTheFactory/article-extraction-dataset
Article title, authors, date and body extraction dataset.
FerreroJeremy/Plagiarized-Corpus-Generator
A corpus builder for evaluation of plagiarism detection tools
tubone24/askfm-qa-crawler
Crawl Ask.fm QA lists and create corpus for ML.
writecrow/crow_backend
The canonical resources to build the backend for a corpus/repository management framework for Crow, the Corpus and Repository of Writing
writecrow/crow_frontend
The user interface for the Corpus & Repository of Writing, built in Angular
CristinaGHolgado/vikitext
Extract text from Vikidia/Wikipedia articles [fr]
IDS-Mannheim/Wikipedia-Corpus-Builder
Builds Wikipedia corpora in I5 (a TEI-based format)
sorinmarti/fruechtekorb
This is a text corpus management system for the german linguistic department of the university of Basel.
adpaczek/chatbot
Chatbot in Polish language, trained on movie subtitles collected using web scraping, based on Transformer architecture.
binayachaudari/Corpus-Development-Software
Corpus Development Software for Machine Translation
c0ntradicti0n/CorpusCookApp
App and Scripts working with the corpus-builder CorpusCook, to have a corpus updated with corrected wrong predictions