/LexiconPrime

Welcome to LexiconPrime, Get free open-source embeddings of publicly available datasets.

Primary LanguageJupyter Notebook

Lexicon Prime

Welcome to LexiconPrime, a project where we aim to provide free high quality embeddings!

Datasets Available

Dataset Description Link
20News Collection of newsgroup documents classified into different topics 20News
BBC News Dataset News articles categorized into different topics by BBC BBC News Dataset
Kpris Dataset Large Korean text dataset covering various domains Kpris Dataset

Datasets We Plan To Work On

Dataset Description Link
Common Crawl Web dataset with a vast collection of web pages Common Crawl
Wikipedia Dump of entire Wikipedia articles Wikipedia
OpenWebText Large dataset of web pages from diverse domains OpenWebText
BookCorpus Dataset with text from over 11,000 books BookCorpus
Google News Dataset Collection of news articles from various sources Google News Dataset
PubMed Repository of biomedical literature PubMed
Twitter Datasets capturing tweets from Twitter Twitter Datasets
Reddit Datasets derived from the Reddit social media platform Reddit Datasets
Stack Exchange Network of question-and-answer websites Stack Exchange Data Dump
Yelp Dataset Dataset with reviews and ratings for businesses Yelp Dataset
GPT-3 Generated Text Text generated by the GPT-3 language model GPT-3 Generated Text
Amazon Reviews Dataset containing product reviews from Amazon Amazon Reviews
IMDB Movie Reviews Dataset of movie reviews from IMDB IMDB Movie Reviews
Reuters News Dataset Collection of news articles from Reuters Reuters News Dataset
AG's News Topic Classification News articles classified into different topics AG's News Dataset
WikiText-103 Large-scale dataset of Wikipedia articles for language modeling WikiText-103
ArXiv Dataset Research papers from various disciplines on arXiv.org ArXiv Dataset
EuroParl Corpus Parallel corpus of European Parliament proceedings EuroParl Corpus
WikiData Structured knowledge base derived from Wikipedia WikiData
GitHub Repositories Collection of code and README files from GitHub repositories GitHub Archive