dataset-generation
There are 514 repositories under dataset-generation topic.
aitorzip/DeepGTAV
A plugin for GTAV that transforms it into a vision-based self-driving car research environment.
nfstream/nfstream
NFStream: a Flexible Network Data Analysis Framework.
rodrigopivi/Chatito
🎯🗯 Dataset generation for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
aqeelanwar/MaskTheFace
Convert face dataset to masked dataset
DIYer22/bpycv
Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)
SimGus/Chatette
A powerful dataset generator for Rasa NLU, inspired by Chatito
HeegyuKim/open-korean-instructions
언어모델을 학습하기 위한 공개 한국어 instruction dataset들을 모아두었습니다.
radi-cho/datasetGPT
A command-line interface to generate textual and conversational datasets with LLMs.
fjxmlzn/DoppelGANger
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
facebookresearch/stopes
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
ylogx/aesthetics
Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader
firmai/datagene
DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)
pprp/voc2007_for_yolo_torch
:punch: Prepare VOC format datasets for ultralytics/yolov3 & yolov5
google/imageinwords
Data release for the ImageInWords (IIW) paper.
ZhangYuanhan-AI/Bamboo
Bamboo: 4 times larger than ImageNet; 2 time larger than Object365; Built by active learning.
davidmartinrius/speech-dataset-generator
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
AlvaroCavalcante/auto_annotate
Labeling is boring. Use this tool to speed up your next object detection project!
suvojit-0x55aa/celebA-HQ-dataset-download
Get started with CelebA-HQ dataset in under 5 mins !
yc9701/pansori
Tools for ASR Corpus Generation from Online Video
hridaydutta123/the-youtube-scraper
Download YouTube video description and video comments without using the YouTube API.
seart-group/ghs
GitHub Search: Platform used to crawl, store and present projects from GitHub, as well as any statistics related to them
AgaMiko/pixel_character_generator
Generating retro pixel game characters with Generative Adversarial Networks. Dataset "TinyHero" included.
rioharper/VocalForge
Your one-stop solution for voice dataset creation
MatteoGuadrini/pyreports
pyreports is a python library that allows you to create complex report from various sources
remyxai/VQASynth
Compose multimodal datasets 🎹
asaparov/prontoqa
Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.
jim-schwoebel/download_audioset
📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).
futianfan/clinical-trial-outcome-prediction
benchmark dataset and Deep learning method (Hierarchical Interaction Network, HINT) for clinical trial approval probability prediction, published in Cell Patterns 2022.
Spphire/RM-labeling-tool
It's a simulator based on Unity for RoboMaster. You can use it to get some labeled dataset for deep learning
miendinh/VietnameseOCR
Vietnamese Optical Character Recognition. It works with Vietnamese and Latin characters as well.
ardauzunoglu/TRScraper
TRScraper, doğal dil işleme uygulamalarında kullanılmak amacıyla geliştirilmiş, Türkçe içerik girilen büyük platformlarda metin madenciliği yapma imkanı sunan bir uygulamadır.
Erfaniaa/crypto-trading-strategy-backtester
Easy-to-use cryptocurrency trading strategy simulator and backtester
UttaranB127/STEP
Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits
CDInstitute/Building-Dataset-Generator
Procedural 3D data generation pipeline for architecture
joao-borrego/gap
Gazebo plugins for applying domain randomization
TimeEval/GutenTAG
GutenTAG is an extensible tool to generate time series datasets with and without anomalies; integrated with TimeEval.