dataset
There are 10085 repositories under dataset topic.
public-apis/public-apis
A collective list of free APIs
joke2k/faker
Faker is a Python package that generates fake data for you.
HumanSignal/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
zalandoresearch/fashion-mnist
A MNIST-like fashion product database. Benchmark :point_down:
cvat-ai/cvat
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
lukas-blecher/LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
brightmart/nlp_chinese_corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
doccano/doccano
Open source annotation tool for machine learning practitioners.
satellite-image-deep-learning/techniques
Techniques for deep learning with satellite & aerial imagery
NirantK/awesome-project-ideas
Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas
googlecreativelab/quickdraw-dataset
Documentation on how to access and use the Quick, Draw! Dataset.
mdn/browser-compat-data
This repository contains compatibility data for Web technologies as displayed on MDN
SPLWare/esProc
esProc SPL is a scripting language for data processing, with well-designed rich library functions and powerful syntax, which can be executed in a Java program through JDBC interface and computing independently.
lonePatient/awesome-pretrained-chinese-nlp-models
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
whoiskatrin/sql-translator
SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.
CLUEbenchmark/CLUE
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
wainshine/Chinese-Names-Corpus
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
pytorch/text
Models, data loaders and abstractions for language processing, powered by PyTorch
rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
jdorfman/awesome-json-datasets
A curated list of awesome JSON datasets that don't require authentication.
Belval/TextRecognitionDataGenerator
A synthetic data generator for text recognition
ieee8023/covid-chestxray-dataset
We are building an open database of COVID-19 cases with chest X-ray or CT images.
pydata/pandas-datareader
Extract data from a wide range of Internet sources into a pandas DataFrame.
Charmve/Surface-Defect-Detection
📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.
whylabs/whylogs
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
waymo-research/waymo-open-dataset
Waymo Open Dataset
GeorgeSeif/Semantic-Segmentation-Suite
Semantic Segmentation Suite in TensorFlow. Implement, train, and test new Semantic Segmentation models easily!
unsplash/datasets
🎁 5,400,000+ Unsplash images made available for research and machine learning
meodai/color-names
Large list of handpicked color names 🌈
google-research-datasets/Objectron
Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
PetrochukM/PyTorch-NLP
Basic Utilities for PyTorch Natural Language Processing (NLP)
detectRecog/CCPD
[ECCV 2018] CCPD: a diverse and well-annotated dataset for license plate detection and recognition
sbousseaden/EVTX-ATTACK-SAMPLES
Windows Events Attack Samples
mdeff/fma
FMA: A Dataset For Music Analysis
linhandev/dataset
医学影像数据集列表 『An Index for Medical Imaging Datasets』