synthetic-data-generation
There are 116 repositories under synthetic-data-generation topic.
nucleuscloud/neosync
Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.
sdv-dev/SDV
Synthetic data generation for tabular data
sdv-dev/CTGAN
Conditional GAN for generating synthetic tabular data.
mostly-ai/mostlyai
Synthetic Data SDK ✨
sdv-dev/Copulas
A library to model multivariate data using copulas.
microsoft/genalog
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
Unity-Technologies/PeopleSansPeople
Unity's privacy-preserving human-centric synthetic data generator
fjxmlzn/DoppelGANger
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
ing-bank/INGenious
INGenious Playwright Studio
sdv-dev/DeepEcho
Synthetic Data Generation for mixed-type, multivariate time series.
netsharecmu/NetShare
(SIGCOMM '22) Practical GAN-based Synthetic IP Header Trace Generation using NetShare
keishihara/flow-matching
Flow Matching implemented in PyTorch
mostly-ai/mostlyai-engine
Synthetic Data Engine 💎
Graph-COM/GraphMaker
[TMLR] GraphMaker: Can Diffusion Models Generate Large Attributed Graphs?
weiyifan1023/senator
NeurIPS 2025: Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs
microsoft/CodeMixed-Text-Generator
This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.
openraven/mockingbird
A toolset to test data classification engines that generates mock data in various file formats, sizes and data profiles.
ritaranx/ClinGen
[ACL 2024 Findings] This is the code for our paper "Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models".
Unity-Technologies/AnthroNet
Unity's Privacy-Preserving Novel Human Body Model Trained Solely on Synthetic Data and Corresponding Dense Anthropometric Measurements
aliseyfi75/COSCI-GAN
Codebase for "Generating multivariate time series with COmmon Source CoordInated GAN (COSCI-GAN)"
gongouveia/Whisper-Synthetic-ASR-Dataset-Generator
This UI serves as a Synthetic ASR Dataset Generator powered by/for OpenAI Whisper, enabling users to capture audio, transcribing it, on the fly and manage the generated dataset 🤗. Fine tune Whisper or enhanced and custom datasets
starfishdata/starfish
Synthetic data generation to fuel AI models
kkyuhun94/dalda
[ECCV'24 Workshops Oral] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling
apple/ml-interactive-data-augmentation
Interactive Data Augmentation (CHI 2025)
codezakh/DataEnvGym
A testbed for agents and environments that can automatically improve models through data generation.
dannylee1020/openpo
Building synthetic data for preference tuning
VCL3D/BlenderScripts
Scripts for data generation using Blender and 3D datasets like Matterport3D.
Graph-COM/LayerDAG
[ICLR 2025 Spotlight] LayerDAG: A Layerwise Autoregressive Diffusion Model of Directed Acyclic Graphs
marzekan/WCGAN-GP
TensorFlow 2 implementation of Wasserstein Conditional GAN with Gradient Penalty (WCGAN-GP) for synthetic data generation
causely-oss/awesome-synthetic-apps
A collection of demo applications, telemetry generators and tools for application simulation
OmarSamirz/ImageFromTextGenerator
IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and layouts. IFTG supports all languages and offers endless noise combinations, including custom noise creation.
tarantool/sdvg
Synthetic Data Values Generator
aaron-wheeler/MarketGPT
MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series
stefanrmmr/differentially_private_synthetic_data
Differentially Private Synthetic Data Generation [DP-SDG] - Experimental Setups & Knowledge Base - WORK IN PROGRESS
CatSatOK/Prophets-of-Profit-Evaluating-Synthetic-Data-Techniques-in-Financial-Forecasting-Models
An comparative investigation into WGAN-GP, CTGAN, TimeGAN and DoppelGANger usage for generating synthetic time series finance data for use in forecasting model
ciroraggio/SlicerModalityConverter
SlicerModalityConverter is an open-source 3D Slicer extension designed for medical image-to-image (I2I) translation. The ModalityConverter module integrates multiple deep learning models trained for different kind of I2I translation, providing a user-friendly interface.