We will create a machine learning pipeline to generate time series and other types of datasets using GAN(Generative Adversarial Networks) and LSTM models from custom sample data.
NOTE: This repository is intended for learning and research purposes. It is also worth nothing that this is a WIP and will continue to be updated.
-
Install Anaconda
- Installation instructions by environment https://docs.anaconda.com/anaconda/install/index.html
-
Install conda environment and activate
conda env create -f environment.yaml
conda activate synthetic-data-generator
-
Change into the main pipeline directory
cd synthetic-data-pipeline
-
Install the dependencies
pip install -r src/requirements.txt
-
Setup your IDE for Kedro projects
- PyCharm or IntelliJ (https://kedro.readthedocs.io/en/stable/development/set_up_pycharm.html)
- Visual Studio Code (https://kedro.readthedocs.io/en/stable/development/set_up_vscode.html)
-
Continue the next steps in this README
- Thanks to the work done by Gretel Synthetics (https://github.com/gretelai/gretel-synthetics)
- Thanks to YData for Pandas-Profiling (https://github.com/ydataai/pandas-profiling)