Welcome to LexiconPrime, a project where we aim to provide free high quality embeddings!
Dataset |
Description |
Link |
20News |
Collection of newsgroup documents classified into different topics |
20News |
BBC News Dataset |
News articles categorized into different topics by BBC |
BBC News Dataset |
Kpris Dataset |
Large Korean text dataset covering various domains |
Kpris Dataset |
Datasets We Plan To Work On
Dataset |
Description |
Link |
Common Crawl |
Web dataset with a vast collection of web pages |
Common Crawl |
Wikipedia |
Dump of entire Wikipedia articles |
Wikipedia |
OpenWebText |
Large dataset of web pages from diverse domains |
OpenWebText |
BookCorpus |
Dataset with text from over 11,000 books |
BookCorpus |
Google News Dataset |
Collection of news articles from various sources |
Google News Dataset |
PubMed |
Repository of biomedical literature |
PubMed |
Twitter |
Datasets capturing tweets from Twitter |
Twitter Datasets |
Reddit |
Datasets derived from the Reddit social media platform |
Reddit Datasets |
Stack Exchange |
Network of question-and-answer websites |
Stack Exchange Data Dump |
Yelp Dataset |
Dataset with reviews and ratings for businesses |
Yelp Dataset |
GPT-3 Generated Text |
Text generated by the GPT-3 language model |
GPT-3 Generated Text |
Amazon Reviews |
Dataset containing product reviews from Amazon |
Amazon Reviews |
IMDB Movie Reviews |
Dataset of movie reviews from IMDB |
IMDB Movie Reviews |
Reuters News Dataset |
Collection of news articles from Reuters |
Reuters News Dataset |
AG's News Topic Classification |
News articles classified into different topics |
AG's News Dataset |
WikiText-103 |
Large-scale dataset of Wikipedia articles for language modeling |
WikiText-103 |
ArXiv Dataset |
Research papers from various disciplines on arXiv.org |
ArXiv Dataset |
EuroParl Corpus |
Parallel corpus of European Parliament proceedings |
EuroParl Corpus |
WikiData |
Structured knowledge base derived from Wikipedia |
WikiData |
GitHub Repositories |
Collection of code and README files from GitHub repositories |
GitHub Archive |