synthetic-data-generation

There are 65 repositories under synthetic-data-generation topic.

  • neosync

    nucleuscloud/neosync

    Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.

    Language:Go3.6k20422127
  • SDV

    sdv-dev/SDV

    Synthetic data generation for tabular data

    Language:Python2.4k461.3k320
  • CTGAN

    sdv-dev/CTGAN

    Conditional GAN for generating synthetic tabular data.

    Language:Python1.3k24218296
  • Copulas

    sdv-dev/Copulas

    A library to model multivariate data using copulas.

    Language:Python55722202109
  • microsoft/genalog

    Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

    Language:Jupyter Notebook313121631
  • PeopleSansPeople

    Unity-Technologies/PeopleSansPeople

    Unity's privacy-preserving human-centric synthetic data generator

    Language:C#306271735
  • fjxmlzn/DoppelGANger

    [IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

    Language:Python30164475
  • DeepEcho

    sdv-dev/DeepEcho

    Synthetic Data Generation for mixed-type, multivariate time series.

    Language:Python105114515
  • netsharecmu/NetShare

    (SIGCOMM '22) Practical GAN-based Synthetic IP Header Trace Generation using NetShare

    Language:Python8482223
  • Graph-COM/GraphMaker

    [TMLR] GraphMaker: Can Diffusion Models Generate Large Attributed Graphs?

    Language:Python54116
  • microsoft/CodeMixed-Text-Generator

    This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.

    Language:Jupyter Notebook528612
  • mockingbird

    openraven/mockingbird

    A toolset to test data classification engines that generates mock data in various file formats, sizes and data profiles.

    Language:Python43756
  • ritaranx/ClinGen

    [ACL 2024 Findings] This is the code for our paper "Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models".

    Language:Python36313
  • AnthroNet

    Unity-Technologies/AnthroNet

    Unity's Privacy-Preserving Novel Human Body Model Trained Solely on Synthetic Data and Corresponding Dense Anthropometric Measurements

    Language:Rich Text Format34531
  • aliseyfi75/COSCI-GAN

    Codebase for "Generating multivariate time series with COmmon Source CoordInated GAN (COSCI-GAN)"

    Language:Jupyter Notebook31359
  • kkyuhun94/dalda

    [ECCV'24 Workshops Oral] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling

    Language:Python27314
  • gongouveia/Whisper-Synthetic-ASR-Dataset-Generator

    This UI serves as a Synthetic ASR Dataset Generator powered by/for OpenAI Whisper, enabling users to capture audio, transcribing it, on the fly and manage the generated dataset 🤗. Fine tune Whisper or enhanced and custom datasets

    Language:Python26530
  • VCL3D/BlenderScripts

    Scripts for data generation using Blender and 3D datasets like Matterport3D.

    Language:Python24413
  • marzekan/WCGAN-GP

    TensorFlow 2 implementation of Wasserstein Conditional GAN with Gradient Penalty (WCGAN-GP) for synthetic data generation

    Language:Jupyter Notebook14202
  • codezakh/DataEnvGym

    A testbed for agents and environments that can automatically improve models through data generation.

    Language:Python135
  • stefanrmmr/differentially_private_synthetic_data

    Differentially Private Synthetic Data Generation [DP-SDG] - Experimental Setups & Knowledge Base - WORK IN PROGRESS

    Language:Jupyter Notebook11102
  • SidharthMacherla/conjurer

    R Package to generate synthetic data.

    Language:R93254
  • Graph-COM/LayerDAG

    code for the paper "LayerDAG: A Layerwise Autoregressive Diffusion Model of Directed Acyclic Graphs"

    Language:Python8200
  • arya-upm/mVARbox

    mVARbox is a Matlab toolbox for uni/multivariate data series analysis in both time/space and frequency domains, with focus on mutivariate autoregressive (VAR) models

    Language:MATLAB7200
  • jpdefrutos/DDMR

    3D image registration training framework using adaptive loss weighting and synthetic data generation

    Language:Python7392
  • an-seunghwan/DistVAE

    Official pytorch implementation codes for NeurIPS-2023 accepted paper "Distributional Learning of Variational AutoEncoder: Application to Synthetic Data Generation"

    Language:Python6201
  • CatSatOK/Prophets-of-Profit-Evaluating-Synthetic-Data-Techniques-in-Financial-Forecasting-Models

    An comparative investigation into WGAN-GP, CTGAN, TimeGAN and DoppelGANger usage for generating synthetic time series finance data for use in forecasting model

    Language:Jupyter Notebook6111
  • JJavierRosales/scapy

    Machine Learning Python library for Spacecraft Conjunction Assessment optimisation.

    Language:Jupyter Notebook4100
  • ImageFromTextGenerator

    OmarSamirz/ImageFromTextGenerator

    IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and layouts. IFTG supports all languages and offers endless noise combinations, including custom noise creation.

    Language:Python4201
  • an-seunghwan/synthesizers

    Implementations of various synthesizers with pytorch.

    Language:Python3200
  • HoomanRamezani/drone-defect-detection

    temporal + cnn vision model for classification of windmill defects, with unreal-engine data generation and a custom data augmentation suite

    Language:Python3200
  • SeyedMuhammadHosseinMousavi/Synthetic-Data-Generation-by-Sequential-Monte-Carlo

    Synthetic Data Generation by Sequential Monte Carlo (SMC)

    Language:MATLAB320
  • shaadclt/Ragas-Synthetic-Test-Data-Generation

    This project demonstrates how to generate synthetic test data for Retrieval Augmented Generation (RAG) using Ragas.

    Language:Jupyter Notebook310
  • Few-shot-satellite-image-classification-OPS-SAT

    ShendoxParadox/Few-shot-satellite-image-classification-OPS-SAT

    Few-shot satellite image classification for bringing deep learning on board OPS-SAT

    Language:Python3300
  • TNO-SDG/tabular.eval.utility_metrics

    TNO PET Lab - Synthetic Data Generation (SDG) - Tabular - Evaluation - Utility Metrics

    Language:Python3200
  • lparolari/harlequin

    Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension

    Language:Python2100