datasets
There are 2508 repositories under datasets topic.
awesomedata/awesome-public-datasets
A topic-centric list of HQ open datasets.
HumanSignal/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
huggingface/datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
tonybeltramelli/pix2code
pix2code: Generating Code from a Graphical User Interface Screenshot
akfamily/akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
simonw/datasette
An open source multi-tool for exploring and publishing data
doccano/doccano
Open source annotation tool for machine learning practitioners.
satellite-image-deep-learning/techniques
Techniques for deep learning with satellite & aerial imagery
activeloopai/deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
imaNNeo/fl_chart
FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, and Radar Chart.
liuruoze/EasyPR
(CGCSTCD'2017) An easy, flexible, and accurate plate recognition project for Chinese licenses in unconstrained situations. CGCSTCD = China Graduate Contest on Smart-city Technology and Creative Design
Arize-ai/phoenix
AI Observability & Evaluation
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
CLUEbenchmark/CLUEDatasetSearch
搜索所有中文NLP数据集,附常用英文NLP数据集
jdorfman/awesome-json-datasets
A curated list of awesome JSON datasets that don't require authentication.
roapi/roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
microsoft/torchgeo
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
zhulf0804/3D-PointCloud
Papers and Datasets about Point Cloud.
justinzm/gopup
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
codefuse-ai/Awesome-Code-LLM
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
github/CodeSearchNet
Datasets, tools, and benchmarks for representation learning of code.
FreedomIntelligence/Medical_NLP
Medical NLP Competition, dataset, large models, paper
colour-science/colour
Colour Science for Python
jsbroks/coco-annotator
:pencil2: Web-based image segmentation tool for object detection, localization, and keypoints
RUC-NLPIR/FlashRAG
⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)
prabhuomkar/pytorch-cpp
C++ Implementation of PyTorch Tutorials for Everyone
logpai/loghub
A large collection of system log datasets for AI-driven log analytics [ISSRE'23]
snap-stanford/ogb
Benchmark datasets, data loaders, and evaluators for graph machine learning
isl-org/Open3D-ML
An extension of Open3D to address 3D Machine Learning tasks
jim-schwoebel/voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
diffgram/diffgram
The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
ChineseGLUE/ChineseGLUE
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
JuliaData/DataFrames.jl
In-memory tabular data in Julia
WLiK/LLM4Rec-Awesome-Papers
A list of awesome papers and resources of recommender system on large language model (LLM).
eosphoros-ai/DB-GPT-Hub
A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL