dataset

There are 12865 repositories under dataset topic.

public-apis/public-apis
A collective list of free APIs
Language:Python365k 4.4k 72238.3k
HumanSignal/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Language:JavaScript24.7k 184 2.7k3k
joke2k/faker
Faker is a Python package that generates fake data for you.
Language:Python18.7k 224 7952k
lukas-blecher/LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Language:Python15.3k 84 2921.2k
cvat-ai/cvat
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Language:Python14.4k 181 4.6k3.3k
zalandoresearch/fashion-mnist
A MNIST-like fashion product database. Benchmark :point_down:
Language:Python12.4k 331 1033.1k
ConardLi/easy-dataset
A powerful tool for creating fine-tuning datasets for LLM
Language:JavaScript10.7k 53 4021k
doccano/doccano
Open source annotation tool for machine learning practitioners.
Language:Python10.3k 133 1.5k1.8k
brightmart/nlp_chinese_corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
9.8k 287 451.6k
satellite-image-deep-learning/techniques
Techniques for deep learning with satellite & aerial imagery
9.7k 280 251.6k
NirantK/awesome-project-ideas
Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas
8.6k 287 81.3k
googlecreativelab/quickdraw-dataset
Documentation on how to access and use the Quick, Draw! Dataset.
6.5k 207 561k
mdn/browser-compat-data
Browser compatibility data for Web technologies as displayed on MDN
Language:JSON5.4k 251 5.5k2.3k
lonePatient/awesome-pretrained-chinese-nlp-models
Awesome Pretrained Chinese NLP Models，高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Language:Python5.4k 96 13506
SPLWare/esProc
esProc SPL is a JVM-based programming language designed for structured data computation, serving as both a data analysis tool and an embedded computing engine.
Language:Java4.7k 60 54358
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
Language:Python4.5k 108 1.2k1.6k
whoiskatrin/sql-translator
SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.
Language:TypeScript4.3k 33 18372
CLUEbenchmark/CLUE
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Language:Python4.2k 89 100547
wainshine/Chinese-Names-Corpus
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
4.2k 102 291k
rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Language:Python4.2k 32 279362
hyunwoongko/transformer
Transformer: PyTorch Implementation of "Attention Is All You Need"
Language:Python4k 10 23566
OpenCSGs/csghub
CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both open-source and on-premise/SaaS solutions, with features comparable to Hugging Face. Gain full control over the lifecycle of LLMs, datasets, and agents, with Python SDK compatibility with Hugging Face. Join us! ⭐️
Language:Vue4k 258 156387
Charmve/Surface-Defect-Detection
📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.
Language:Python3.8k 52 18583
mlabonne/llm-datasets
Curated list of datasets and tools for post-training.
3.7k 46 3305
Belval/TextRecognitionDataGenerator
A synthetic data generator for text recognition
Language:Python3.6k 63 2511k
pytorch/text
Models, data loaders and abstractions for language processing, powered by PyTorch
Language:Python3.6k 338 812812
jdorfman/awesome-json-datasets
A curated list of awesome JSON datasets that don't require authentication.
Language:JavaScript3.5k 89 31384
linhandev/dataset
医学影像数据集列表『An Index for Medical Imaging Datasets』
3.3k 22 59415
Zjh-819/LLMDataHub
A quick guide (especially) for trending instruction finetuning datasets
3.2k 51 3222
pydata/pandas-datareader
Extract data from a wide range of Internet sources into a pandas DataFrame.
Language:Python3.1k 143 540678
ieee8023/covid-chestxray-dataset
We are building an open database of COVID-19 cases with chest X-ray or CT images.
Language:Jupyter Notebook3k 155 1121.3k
waymo-research/waymo-open-dataset
Waymo Open Dataset
Language:Python3k 72 911671
whylabs/whylogs
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
Language:Jupyter Notebook2.8k 32 432132
M-3LAB/awesome-industrial-anomaly-detection
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
2.7k 84 19236
ashvardanian/StringZilla
Up to 100x faster strings for C, C++, CUDA, Python, Rust, Swift, JS, & Go, leveraging NEON, AVX2, AVX-512, SVE, GPGPU, & SWAR to accelerate search, hashing, sorting, edit distances, sketches, and memory ops 🦖
Language:C2.7k 25 10389
meodai/color-names
Large list of handpicked color names 🌈
Language:JavaScript2.7k 26 101193

dataset

public-apis/public-apis

HumanSignal/label-studio

joke2k/faker

lukas-blecher/LaTeX-OCR

cvat-ai/cvat

zalandoresearch/fashion-mnist

ConardLi/easy-dataset

doccano/doccano

brightmart/nlp_chinese_corpus

satellite-image-deep-learning/techniques

NirantK/awesome-project-ideas

googlecreativelab/quickdraw-dataset

mdn/browser-compat-data

lonePatient/awesome-pretrained-chinese-nlp-models

SPLWare/esProc

tensorflow/datasets

whoiskatrin/sql-translator

CLUEbenchmark/CLUE

wainshine/Chinese-Names-Corpus

rom1504/img2dataset

hyunwoongko/transformer

OpenCSGs/csghub

Charmve/Surface-Defect-Detection

mlabonne/llm-datasets

Belval/TextRecognitionDataGenerator

pytorch/text

jdorfman/awesome-json-datasets

linhandev/dataset

Zjh-819/LLMDataHub

pydata/pandas-datareader

ieee8023/covid-chestxray-dataset

waymo-research/waymo-open-dataset

whylabs/whylogs

M-3LAB/awesome-industrial-anomaly-detection

ashvardanian/StringZilla

meodai/color-names