/syndgen

SYNthetic Data GENeration made easy for everyone, free and open-sourced.

Primary LanguagePythonMIT LicenseMIT

Important

This repository was created as a response to the licensing change of the SDV project (maintained by DataCebo). SDV is currently one of the largest ecosystems for synthetic data generation and was originally published under the MIT license, but has since transitioned to a non-commercial Business Source License. This greatly limits how the source code can be used, and specifically prohibits the use of the source code for building any commercial offering that generates synthetic data using their code.

I believe that machine learning, and any software that builds on, or is derived from, any published research in the field, should be free and open-sourced no matter how the software might be used. If you agree with this viewpoint, then consider contributing to this project so that we can help democritize machine learning together.

syndgen

syndgen is a Python library for generating, evaluating, and working with synthetic data.

Key features (to be implemented in v1):

  • Pre-process and transform data for data-science needs.
  • Generate synthetic data from single- or multi-tabular datasets.
  • Evaluate any synthetic data with robust metrics.
  • Visualize how your synthetic data compares to the real data.
  • ...

Installing

Install using pip like

python3 -m pip install syndgen

Usage

import syndgen as sg
from syndgen.models import HCTGAN

...