data-generation
There are 213 repositories under data-generation topic.
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
benkeen/generatedata
A powerful, feature-rich, random test data generator.
sdv-dev/SDV
Synthetic data generation for tabular data
AgaMiko/data-augmentation-review
List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.
neomatrix369/awesome-ai-ml-dl
Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.
shuttle-hq/synth
The Declarative Data Generator
sdv-dev/CTGAN
Conditional GAN for generating synthetic tabular data.
whatyouhide/stream_data
Data generation and property-based testing for Elixir. 🔮
Westlake-AI/openmixup
CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark
nomemory/mockneat
MockNeat - the modern faker lib.
tom-lord/regexp-examples
Generate strings that match a given regular expression
sdv-dev/Copulas
A library to model multivariate data using copulas.
MTG/DeepConvSep
Deep Convolutional Neural Networks for Musical Source Separation
tirthajyoti/pydbgen
Random dataframe and database table generator
microsoft/genalog
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
databrickslabs/dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
trinker/wakefield
Generate random data sets
kathrinse/be_great
A novel approach for synthesizing tabular data using pretrained large language models
cieslarmichal/faker-cxx
C++ Faker library for generating fake (but realistic) data.
worldbank/REaLTabFormer
A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
finos/datahelix
The DataHelix generator allows you to quickly create data, based on a JSON profile that defines fields and the relationships between them, for the purpose of testing and validation
rapiddweller/rapiddweller-benerator-ce
BENERATOR is a leading software solution to generate, obfuscate, pseudonymize and migrate data for development, testing, and training purposes with a model-driven approach.
gretelai/awesome-synthetic-data
📖 A curated list of resources dedicated to synthetic data
louisYen/Gen4Gen
🏞️ Official implementation of "Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition"
sdv-dev/DeepEcho
Synthetic Data Generation for mixed-type, multivariate time series.
mjkvaak/ImageDataAugmentor
Custom image data generator for TF Keras that supports the modern augmentation module albumentations
tinybirdco/mockingbird
Mockingbird is a mock streaming data generator
ykang/gratis
GRATIS: GeneRAting TIme Series with diverse and controllable characteristics
kgoldfeld/simstudy
simstudy: Illuminating research methods through data generation
smartcat-labs/ranger
Ranger is contextual data generator used to make sensible data for integration tests or to play with it in the database
tosiron/jazznet
jazznet dataset of piano patterns for music audio machine learning research
leezythu/FlexKBQA
FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering
dmey/synthia
📈 🐍 Multidimensional synthetic data generation with Copula and fPCA models in Python
edyan/neuralyzer
Neuralyzer is a library and a command line tool to anonymize databases (by updating existing data or populating a table with fake data)
Cambalab/fake-data-generator
Just a small open-source script to create fake data given a simple JSON model.
microsoft/CodeMixed-Text-Generator
This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.