Important
This repository was created as a response to the licensing change of the SDV project (maintained by DataCebo). SDV is currently one of the largest ecosystems for synthetic data generation and was originally published under the MIT license, but has since transitioned to a non-commercial Business Source License. This greatly limits how the source code can be used, and specifically prohibits the use of the source code for building any commercial offering that generates synthetic data using their code.
I believe that machine learning, and any software that builds on, or is derived from, any published research in the field, should be free and open-sourced no matter how the software might be used. If you agree with this viewpoint, then consider contributing to this project so that we can help democritize machine learning together.
syndgen
is a Python library for generating, evaluating, and working with synthetic data.
Key features (to be implemented in v1):
- Pre-process and transform data for data-science needs.
- Generate synthetic data from single- or multi-tabular datasets.
- Evaluate any synthetic data with robust metrics.
- Visualize how your synthetic data compares to the real data.
- ...
Install using pip like
python3 -m pip install syndgen
import syndgen as sg
from syndgen.models import HCTGAN
...