text-corpus

There are 35 repositories under text-corpus topic.

miras-tech/MirasText
MirasText
Language:Python71 8 09
Ermlab/PoLitBert
Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good model.
Language:Python34 11 23
mrzjy/StarrailDialog
A project that extracts Honkai: Star Rail text corpus
Language:Python23 5 11
t-systems-on-site-services-gmbh/german-wikipedia-text-corpus
This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.
22 3 14
WING-NUS/nus-sms-corpus
This is the distribution point for the NUS SMS Corpus as described and updated from This is a corpus of SMS (Short Message Service) messages collected for research at the Department of Computer Science at the National University of Singapore. This dataset consists of 67,093 SMS messages taken from the corpus on Mar 9, 2015. The messages largely originate from Singaporeans and mostly from students attending the University. These messages were collected from volunteers who were made aware that their contributions were going to be made publicly available. The data collectors opportunistically collected as much metadata about the messages and their senders as possible, so as to enable different types of analyses. This corpus was collected by Tao Chen and Min-Yen Kan. If you use this data, please ensure the following paper is cited. For more details, please refer to Citation field. Tao Chen and Min-Yen Kan (2013). Creating a Live, Public Short Message Service Corpus: The NUS SMS Corpus. Language Resources and Evaluation, 47(2)(2013), pages 299-355. URL: https://link.springer.com/article/10.1007%2Fs10579-012-9197-9
22 3 04
AsoSoft/AsoSoft-Text-Corpus
AsoSoft Text Corpus is the first large scale text corpus for the Kurdish language.
17 3 13
nikitaeverywhere/edu-text-analysis-experiments
Statistical text analysis and semantic networks with Python
Language:Python14 2 04
lucylow/Yeezy-Taught-Me
Yeezy Taught Me Text Generation. Training next character predictions RNN LSTM model with user input text corpus
Language:JavaScript10 3 51
jonsafari/habeas-corpus
Command-line corpus tools
Language:Shell9 4 01
JuliusBahr/SimpleSimilarity
A framework for semantic text search
Language:Swift8 1 00
appeler/search_names
Search a long list of names (patterns) in a large text corpus systematically and quickly
Language:Python7 3 51
jcrippen/tlingit-corpus
Text corpus the of Tlingit language for linguistic research.
Language:Shell6 5 92
luonglearnstocode/Seinfeld-text-corpus
text corpus :page_with_curl: scraped from the scripts :speech_balloon: of all Seinfeld episodes
Language:Jupyter Notebook6 2 00
thecsw/katya-dev
Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!
Language:Go5 3 00
capetocape/crawl-text-title-as-corpus
Crawling data from websites as text corpus
Language:Python2 0 00
Chandra-cc/Tesseract_ICR-Sheets
A model was trained using Google handwritten Fonts using a text corpus containing only digits ranging from 0-9. The main aim was to recognize ICR sheets from such trained data. Our model gave an accuracy of 94.6% using Tesseract Version-4.
Language:Python2 1 02
cligs/conha19
Corpus de novelas hispanoamericanas del siglo XIX (conha19)
Language:XSLT2 3 21
kurpicz/tcc
Text Corpus Collection
Language:C++2 2 01
seanpm2001/DroppedText_Corpus
A text corpus collection for the DroppedText language.
2 3 12
soumyadeepghoshGG/Twitter-Sentiment-Analysis-with-NLP
Using natural language processing techniques to determine the sentiment expressed in a tweet, classified as positive or negative.
Language:Jupyter Notebook2 1 00
TextCorpusLabs/wikimedia
Walk through to convert WikiMedia into a text corpus
Language:Python2 2 01
alexlilia/igc-corpus-reader
This is a tool which can be used to index and query a large XML-based text corpus using Elasticsearch.
Language:Python1 4 00
hari8github/NLP
Sentiment analysis models using NLP and other important basics of NLP and subwords and a song lyric generator!
Language:Jupyter Notebook0 1 00
jdave23/EAD-corpus
A collection of encoded archival description XML documents for text and content analysis.
Language:Shell0 1 00
RedditEpidemicAnalysis/data
Data collection scripts for analysis of Reddit
0 2 00
s-bose7/ngram-viewer
Exploring the history of word usage in English texts with a weighted popularity history plot.
Language:Java0 1 50
skyisveryblue1/corpus-filter
Simple utility to filter out text corpus according to frequencies of words consisting sentences in it
Language:C++0 1 00
TextCorpusLabs/congressional-votes
Walk through to convert congressional roll call votes into a text corpus
Language:Python0 2 00
TextCorpusLabs/covid19
Walk through to convert Kaggle's COVID-19 Open Research Dataset Challenge into a text corpus
Language:Python0 2 00
TextCorpusLabs/NJGovNews
Web scraping of the New Jersey news feeds
Language:Python0 1 00
TextCorpusLabs/oas
Walk through to convert PMC OAS Dataset into a text corpus
Language:Python0 2 00
WHOSpeeches/WHODataHub
Collect the WHO's Director General's speeches.
Language:Python0 2 00
AbdullahButt2611/TextAnalyzer
"Text Analyzer" is a web application designed to analyze any given text or script and provide users with useful information about its contents.
Language:HTML1 0
alla-g/NLP2020
Final project for Natural language processing course in final_project_diary folder
Language:Jupyter Notebook0 01
motazsaad/corpus-expander
Expanding sentences in a given text corpus. The code checks for NE in sentences and create new sentences by injecting new NEs from NE list.
Language:Python3 02

text-corpus

miras-tech/MirasText

Ermlab/PoLitBert

mrzjy/StarrailDialog

t-systems-on-site-services-gmbh/german-wikipedia-text-corpus

WING-NUS/nus-sms-corpus

AsoSoft/AsoSoft-Text-Corpus

nikitaeverywhere/edu-text-analysis-experiments

lucylow/Yeezy-Taught-Me

jonsafari/habeas-corpus

JuliusBahr/SimpleSimilarity

appeler/search_names

jcrippen/tlingit-corpus

luonglearnstocode/Seinfeld-text-corpus

thecsw/katya-dev

capetocape/crawl-text-title-as-corpus

Chandra-cc/Tesseract_ICR-Sheets

cligs/conha19

kurpicz/tcc

seanpm2001/DroppedText_Corpus

soumyadeepghoshGG/Twitter-Sentiment-Analysis-with-NLP

TextCorpusLabs/wikimedia

alexlilia/igc-corpus-reader

hari8github/NLP

jdave23/EAD-corpus

RedditEpidemicAnalysis/data

s-bose7/ngram-viewer

skyisveryblue1/corpus-filter

TextCorpusLabs/congressional-votes

TextCorpusLabs/covid19

TextCorpusLabs/NJGovNews

TextCorpusLabs/oas

WHOSpeeches/WHODataHub

AbdullahButt2611/TextAnalyzer

alla-g/NLP2020

motazsaad/corpus-expander