synthetic-data-generation

There are 116 repositories under synthetic-data-generation topic.

  • neosync

    nucleuscloud/neosync

    Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.

    Language:Go4.1k22470221
  • SDV

    sdv-dev/SDV

    Synthetic data generation for tabular data

    Language:Python3.3k431.5k398
  • CTGAN

    sdv-dev/CTGAN

    Conditional GAN for generating synthetic tabular data.

    Language:Python1.5k21230326
  • mostlyai

    mostly-ai/mostlyai

    Synthetic Data SDK ✨

    Language:Python68382758
  • Copulas

    sdv-dev/Copulas

    A library to model multivariate data using copulas.

    Language:Python62019217117
  • microsoft/genalog

    Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

    Language:Jupyter Notebook33581733
  • PeopleSansPeople

    Unity-Technologies/PeopleSansPeople

    Unity's privacy-preserving human-centric synthetic data generator

    Language:C#318231835
  • fjxmlzn/DoppelGANger

    [IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

    Language:Python30844476
  • INGenious

    ing-bank/INGenious

    INGenious Playwright Studio

    Language:Java120104344
  • DeepEcho

    sdv-dev/DeepEcho

    Synthetic Data Generation for mixed-type, multivariate time series.

    Language:Python11874917
  • netsharecmu/NetShare

    (SIGCOMM '22) Practical GAN-based Synthetic IP Header Trace Generation using NetShare

    Language:Python8872327
  • keishihara/flow-matching

    Flow Matching implemented in PyTorch

    Language:Python77207
  • mostlyai-engine

    mostly-ai/mostlyai-engine

    Synthetic Data Engine 💎

    Language:Python6712
  • Graph-COM/GraphMaker

    [TMLR] GraphMaker: Can Diffusion Models Generate Large Attributed Graphs?

    Language:Python64148
  • weiyifan1023/senator

    NeurIPS 2025: Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs

    Language:Python623
  • microsoft/CodeMixed-Text-Generator

    This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.

    Language:Jupyter Notebook566612
  • mockingbird

    openraven/mockingbird

    A toolset to test data classification engines that generates mock data in various file formats, sizes and data profiles.

    Language:Python44656
  • ritaranx/ClinGen

    [ACL 2024 Findings] This is the code for our paper "Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models".

    Language:Python40323
  • AnthroNet

    Unity-Technologies/AnthroNet

    Unity's Privacy-Preserving Novel Human Body Model Trained Solely on Synthetic Data and Corresponding Dense Anthropometric Measurements

    Language:Rich Text Format36232
  • aliseyfi75/COSCI-GAN

    Codebase for "Generating multivariate time series with COmmon Source CoordInated GAN (COSCI-GAN)"

    Language:Jupyter Notebook34259
  • gongouveia/Whisper-Synthetic-ASR-Dataset-Generator

    This UI serves as a Synthetic ASR Dataset Generator powered by/for OpenAI Whisper, enabling users to capture audio, transcribing it, on the fly and manage the generated dataset 🤗. Fine tune Whisper or enhanced and custom datasets

    Language:Python32432
  • starfishdata/starfish

    Synthetic data generation to fuel AI models

    Language:Python313
  • kkyuhun94/dalda

    [ECCV'24 Workshops Oral] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling

    Language:Python30214
  • apple/ml-interactive-data-augmentation

    Interactive Data Augmentation (CHI 2025)

    Language:Svelte28903
  • codezakh/DataEnvGym

    A testbed for agents and environments that can automatically improve models through data generation.

    Language:Python27156
  • dannylee1020/openpo

    Building synthetic data for preference tuning

    Language:Python27210
  • VCL3D/BlenderScripts

    Scripts for data generation using Blender and 3D datasets like Matterport3D.

    Language:Python27313
  • Graph-COM/LayerDAG

    [ICLR 2025 Spotlight] LayerDAG: A Layerwise Autoregressive Diffusion Model of Directed Acyclic Graphs

    Language:Python24103
  • marzekan/WCGAN-GP

    TensorFlow 2 implementation of Wasserstein Conditional GAN with Gradient Penalty (WCGAN-GP) for synthetic data generation

    Language:Jupyter Notebook20112
  • awesome-synthetic-apps

    causely-oss/awesome-synthetic-apps

    A collection of demo applications, telemetry generators and tools for application simulation

  • ImageFromTextGenerator

    OmarSamirz/ImageFromTextGenerator

    IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and layouts. IFTG supports all languages and offers endless noise combinations, including custom noise creation.

    Language:Python19231
  • sdvg

    tarantool/sdvg

    Synthetic Data Values Generator

    Language:Go185
  • aaron-wheeler/MarketGPT

    MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series

    Language:Jupyter Notebook16106
  • stefanrmmr/differentially_private_synthetic_data

    Differentially Private Synthetic Data Generation [DP-SDG] - Experimental Setups & Knowledge Base - WORK IN PROGRESS

    Language:Jupyter Notebook12102
  • CatSatOK/Prophets-of-Profit-Evaluating-Synthetic-Data-Techniques-in-Financial-Forecasting-Models

    An comparative investigation into WGAN-GP, CTGAN, TimeGAN and DoppelGANger usage for generating synthetic time series finance data for use in forecasting model

    Language:Jupyter Notebook11111
  • SlicerModalityConverter

    ciroraggio/SlicerModalityConverter

    SlicerModalityConverter is an open-source 3D Slicer extension designed for medical image-to-image (I2I) translation. The ModalityConverter module integrates multiple deep learning models trained for different kind of I2I translation, providing a user-friendly interface.

    Language:Python10