llm-datasets
There are 12 repositories under llm-datasets topic.
neo4j-labs/text2cypher
collection of text2cypher datasets, evaluations, and finetuning instructions
dsdanielpark/open-llm-datasets
Repository for organizing datasets and papers used in Open LLM.
discus-labs/discus
A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ
asimsinan/LLM-Research
A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks
altunenes/rustysozluk
Efficiently fetch and perform sentiment analysis (Turkish Only) on eksisozluk.com entries using Rust
DefinetlyNotAI/LLM_Data
A bunch of very famous repos source code's in python as pure localdocs all in this repo to train CODE AI
arian-askari/SOLID
Synthetically Generating Intent-Aware Information-Seeking Dialogues! Useful for various tasks such as training/evaluating User Intent Predictors with the possibility to training/evaluating on real human dialogues. The backbone LLM of SOLID is Zephyr-7b-beta.
tiddly-gittly/TiddlyWiki-LLM-dataset
WikiText syntax dataset generation pipeline and open dataset for auto UI generation in TiddlyWiki. (WIP)
redblock-ai/parrot-python
PARROT (Performance Assessment of Reasoning and Responses On Trivia) is a novel benchmarking framework designed to evaluate Large Language Models (LLMs) on real-world, complex, and ambiguous QA tasks.
aloobun/basedUX
minimal dataset conisting og 363 Human & Assitant dialogs
aloobun/ccpem-modified
A modified dataset consisting of English dialogs between a user and an assistant discussing movie preferences in natural language.
jsurrea/LLM-Latino
Collection of ETL scripts used to create a dataset of text in Spanish to train Large Language Models.