datasets
There are 2395 repositories under datasets topic.
awesomedata/awesome-public-datasets
A topic-centric list of HQ open datasets.
HumanSignal/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
huggingface/datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
tonybeltramelli/pix2code
pix2code: Generating Code from a Graphical User Interface Screenshot
cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
akfamily/akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
doccano/doccano
Open source annotation tool for machine learning practitioners.
simonw/datasette
An open source multi-tool for exploring and publishing data
satellite-image-deep-learning/techniques
Techniques for deep learning with satellite & aerial imagery
activeloopai/deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
imaNNeo/fl_chart
FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, and Radar Chart.
liuruoze/EasyPR
(CGCSTCD'2017) An easy, flexible, and accurate plate recognition project for Chinese licenses in unconstrained situations. CGCSTCD = China Graduate Contest on Smart-city Technology and Creative Design
Arize-ai/phoenix
AI Observability & Evaluation
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
CLUEbenchmark/CLUEDatasetSearch
搜索所有中文NLP数据集,附常用英文NLP数据集
jdorfman/awesome-json-datasets
A curated list of awesome JSON datasets that don't require authentication.
roapi/roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
OpenCSGs/csghub
CSGHub is an open-source large model platform just like on-premise version of Hugging Face. You can easily manage models and datasets, deploy model applications and setup model finetune or inference jobs with user interface. CSGHub also provides Python SDK with full compatibility of hf sdk. Join us together to build a safer and more open platform⭐️
microsoft/torchgeo
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
justinzm/gopup
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
zhulf0804/3D-PointCloud
Papers and Datasets about Point Cloud.
github/CodeSearchNet
Datasets, tools, and benchmarks for representation learning of code.
FreedomIntelligence/Medical_NLP
Medical NLP Competition, dataset, large models, paper
colour-science/colour
Colour Science for Python
jsbroks/coco-annotator
:pencil2: Web-based image segmentation tool for object detection, localization, and keypoints
prabhuomkar/pytorch-cpp
C++ Implementation of PyTorch Tutorials for Everyone
snap-stanford/ogb
Benchmark datasets, data loaders, and evaluators for graph machine learning
isl-org/Open3D-ML
An extension of Open3D to address 3D Machine Learning tasks
logpai/loghub
A large collection of system log datasets for AI-driven log analytics [ISSRE'23]
diffgram/diffgram
The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
codefuse-ai/Awesome-Code-LLM
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
ChineseGLUE/ChineseGLUE
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
jim-schwoebel/voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
JuliaData/DataFrames.jl
In-memory tabular data in Julia
juand-r/entity-recognition-datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
eosphoros-ai/DB-GPT-Hub
A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL