llm-datasets

There are 12 repositories under llm-datasets topic.

neo4j-labs/text2cypher
collection of text2cypher datasets, evaluations, and finetuning instructions
Language:Jupyter Notebook153 5 318
dsdanielpark/open-llm-datasets
Repository for organizing datasets and papers used in Open LLM.
92 5 06
discus-labs/discus
A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ
Language:Python64 1 187
asimsinan/LLM-Research
A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks
Language:Python43 2 06
altunenes/rustysozluk
Efficiently fetch and perform sentiment analysis (Turkish Only) on eksisozluk.com entries using Rust
Language:Rust7 1 40
DefinetlyNotAI/LLM_Data
A bunch of very famous repos source code's in python as pure localdocs all in this repo to train CODE AI
Language:Python3 1 00
arian-askari/SOLID
Synthetically Generating Intent-Aware Information-Seeking Dialogues! Useful for various tasks such as training/evaluating User Intent Predictors with the possibility to training/evaluating on real human dialogues. The backbone LLM of SOLID is Zephyr-7b-beta.
Language:Python2 1 01
tiddly-gittly/TiddlyWiki-LLM-dataset
WikiText syntax dataset generation pipeline and open dataset for auto UI generation in TiddlyWiki. (WIP)
Language:TypeScript2 1 0
redblock-ai/parrot-python
PARROT (Performance Assessment of Reasoning and Responses On Trivia) is a novel benchmarking framework designed to evaluate Large Language Models (LLMs) on real-world, complex, and ambiguous QA tasks.
Language:Python1 2 50
aloobun/basedUX
minimal dataset conisting og 363 Human & Assitant dialogs
0 1 00
aloobun/ccpem-modified
A modified dataset consisting of English dialogs between a user and an assistant discussing movie preferences in natural language.
0 1 00
jsurrea/LLM-Latino
Collection of ETL scripts used to create a dataset of text in Spanish to train Large Language Models.
Language:Python2 0

llm-datasets

neo4j-labs/text2cypher

dsdanielpark/open-llm-datasets

discus-labs/discus

asimsinan/LLM-Research

altunenes/rustysozluk

DefinetlyNotAI/LLM_Data

arian-askari/SOLID

tiddly-gittly/TiddlyWiki-LLM-dataset

redblock-ai/parrot-python

aloobun/basedUX

aloobun/ccpem-modified

jsurrea/LLM-Latino