datasets
There are 2725 repositories under datasets topic.
awesomedata/awesome-public-datasets
A topic-centric list of HQ open datasets.
HumanSignal/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
huggingface/datasets
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
akfamily/akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
tonybeltramelli/pix2code
pix2code: Generating Code from a Graphical User Interface Screenshot
cleanlab/cleanlab
Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
simonw/datasette
An open source multi-tool for exploring and publishing data
doccano/doccano
Open source annotation tool for machine learning practitioners.
satellite-image-deep-learning/techniques
Techniques for deep learning with satellite & aerial imagery
activeloopai/deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
imaNNeo/fl_chart
FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, Radar Chart and Candlestick Chart.
Arize-ai/phoenix
AI Observability & Evaluation
liuruoze/EasyPR
(CGCSTCD'2017) An easy, flexible, and accurate plate recognition project for Chinese licenses in unconstrained situations. CGCSTCD = China Graduate Contest on Smart-city Technology and Creative Design
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
CLUEbenchmark/CLUEDatasetSearch
搜索所有中文NLP数据集,附常用英文NLP数据集
torchgeo/torchgeo
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
jdorfman/awesome-json-datasets
A curated list of awesome JSON datasets that don't require authentication.
roapi/roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
RUC-NLPIR/FlashRAG
⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)
codefuse-ai/Awesome-Code-LLM
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
zhulf0804/3D-PointCloud
Papers and Datasets about Point Cloud.
justinzm/gopup
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
langwatch/langwatch
The open LLM Ops platform - Traces, Analytics, Evaluations, Datasets and Prompt Optimization ✨
colour-science/colour
Colour Science for Python
github/CodeSearchNet
Datasets, tools, and benchmarks for representation learning of code.
FreedomIntelligence/Medical_NLP
Medical NLP Competition, dataset, large models, paper
logpai/loghub
A large collection of system log datasets for AI-driven log analytics [ISSRE'23]
jsbroks/coco-annotator
:pencil2: Web-based image segmentation tool for object detection, localization, and keypoints
isl-org/Open3D-ML
An extension of Open3D to address 3D Machine Learning tasks
prabhuomkar/pytorch-cpp
C++ Implementation of PyTorch Tutorials for Everyone
WLiK/LLM4Rec-Awesome-Papers
A list of awesome papers and resources of recommender system on large language model (LLM).
snap-stanford/ogb
Benchmark datasets, data loaders, and evaluators for graph machine learning
jim-schwoebel/voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
eosphoros-ai/DB-GPT-Hub
A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL
diffgram/diffgram
The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
JuliaData/DataFrames.jl
In-memory tabular data in Julia