synthetic-data-generator

We will create a machine learning pipeline to generate time series and other types of datasets using GAN(Generative Adversarial Networks) and LSTM models from custom sample data.

NOTE: This repository is intended for learning and research purposes. It is also worth nothing that this is a WIP and will continue to be updated.

Experimentation Notebooks

Time Series dGAN Synthetic Data Generation

LSTM Synthetic Data Generation

Some examples - Comparison Reports

Pipeline (Coming Soon)

DRAFT - Pipeline Flow Diagram

Getting started

Install Anaconda
- Installation instructions by environment https://docs.anaconda.com/anaconda/install/index.html
Install conda environment and activate

conda env create -f environment.yaml

conda activate synthetic-data-generator
Change into the main pipeline directory

cd synthetic-data-pipeline
Install the dependencies

pip install -r src/requirements.txt
Setup your IDE for Kedro projects
- PyCharm or IntelliJ (https://kedro.readthedocs.io/en/stable/development/set_up_pycharm.html)
- Visual Studio Code (https://kedro.readthedocs.io/en/stable/development/set_up_vscode.html)
Continue the next steps in this README

References

Thanks to the work done by Gretel Synthetics (https://github.com/gretelai/gretel-synthetics)
Thanks to YData for Pandas-Profiling (https://github.com/ydataai/pandas-profiling)

ShawnKyzer/synthetic-data-generator