datasets
There are 2163 repositories under datasets topic.
awesomedata/awesome-public-datasets
A topic-centric list of HQ open datasets.
huggingface/datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
HumanSignal/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
tonybeltramelli/pix2code
pix2code: Generating Code from a Graphical User Interface Screenshot
doccano/doccano
Open source annotation tool for machine learning practitioners.
simonw/datasette
An open source multi-tool for exploring and publishing data
cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
akfamily/akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
satellite-image-deep-learning/techniques
Techniques for deep learning with satellite & aerial imagery
activeloopai/deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
imaNNeo/fl_chart
FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, and Radar Chart.
liuruoze/EasyPR
(CGCSTCD'2017) An easy, flexible, and accurate plate recognition project for Chinese licenses in unconstrained situations. CGCSTCD = China Graduate Contest on Smart-city Technology and Creative Design
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
CLUEbenchmark/CLUEDatasetSearch
搜索所有中文NLP数据集,附常用英文NLP数据集
jdorfman/awesome-json-datasets
A curated list of awesome JSON datasets that don't require authentication.
roapi/roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
justinzm/gopup
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
microsoft/torchgeo
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
zhulf0804/3D-PointCloud
Papers and Datasets about Point Cloud.
github/CodeSearchNet
Datasets, tools, and benchmarks for representation learning of code.
jsbroks/coco-annotator
:pencil2: Web-based image segmentation tool for object detection, localization, and keypoints
FreedomIntelligence/Medical_NLP
Medical NLP Competition, dataset, large models, paper 医疗NLP领域 比赛,数据集,大模型,论文,工具包
colour-science/colour
Colour Science for Python
snap-stanford/ogb
Benchmark datasets, data loaders, and evaluators for graph machine learning
prabhuomkar/pytorch-cpp
C++ Implementation of PyTorch Tutorials for Everyone
diffgram/diffgram
The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
ChineseGLUE/ChineseGLUE
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
JuliaData/DataFrames.jl
In-memory tabular data in Julia
isl-org/Open3D-ML
An extension of Open3D to address 3D Machine Learning tasks
jim-schwoebel/voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
logpai/loghub
A large collection of system log datasets for AI-driven log analytics [ISSRE'23]
juand-r/entity-recognition-datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
explosion/projects
🪐 End-to-end NLP workflows from prototype to production
PolyAI-LDN/conversational-datasets
Large datasets for conversational AI
MobilityData/awesome-transit
Community list of transit APIs, apps, datasets, research, and software :bus::star2::train::star2::steam_locomotive:
shramos/Awesome-Cybersecurity-Datasets
A curated list of amazingly awesome Cybersecurity datasets