synthetic-data
There are 700 repositories under synthetic-data topic.
stefan-jansen/machine-learning-for-trading
Code for Machine Learning for Algorithmic Trading, 2nd edition.
modelscope/data-juicer
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
lk-geimfari/mimesis
Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.
nucleuscloud/neosync
Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.
Kiln-AI/Kiln
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
DLR-RM/BlenderProc
A procedural Blender pipeline for photorealistic training image generation
sdv-dev/SDV
Synthetic data generation for tabular data
argilla-io/distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
synthetichealth/synthea
Synthetic Patient Population Simulator
hitsz-ids/synthetic-data-generator
SDG is a specialized framework designed to generate high-quality structured tabular data.
unrealcv/unrealcv
UnrealCV: Connecting Computer Vision to Unreal Engine
ydataai/ydata-synthetic
Synthetic data generators for tabular and time-series data
GreenmaskIO/greenmask
PostgreSQL database anonymization and synthetic data generation tool
bespokelabsai/curator
Synthetic data curation for post-training and structured data extraction
sdv-dev/CTGAN
Conditional GAN for generating synthetic tabular data.
shuttle-hq/synth
The Declarative Data Generator
datadreamer-dev/DataDreamer
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
plurai-ai/intellagent
A framework for comprehensive diagnosis and optimization of agents using simulated, realistic synthetic interactions
BatsResearch/bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
magpie-align/magpie
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
Renumics/awesome-open-data-centric-ai
Curated list of open source tooling for data-centric AI on unstructured data.
nicolas-hbt/pygraft
Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
jofpin/synthBTC
A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.
gretelai/gretel-synthetics
Synthetic data generators for structured and unstructured text, featuring differentially private learning.
mostly-ai/mostlyai
Synthetic Data SDK ✨
SciPhi-AI/synthesizer
A multi-purpose LLM framework for RAG and data creation.
sdv-dev/Copulas
A library to model multivariate data using copulas.
vanderschaarlab/synthcity
A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.
paulbricman/thisrepositorydoesnotexist
A curated list of awesome projects which use Machine Learning to generate synthetic content.
yandex-research/tab-ddpm
[ICML 2023] The official implementation of the paper "TabDDPM: Modelling Tabular Data with Diffusion Models"
sparkfish/augraphy
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
plaitpy/plaitpy
plait.py - a fake data modeler
databrickslabs/dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
GeorgeCazenavette/mtt-distillation
Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"
wenbowen123/iros20-6d-pose-tracking
[IROS 2020] se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains
stacklok/promptwright
Generate large synthetic data using an LLM