/etl

Provides a basic directory structure and template files for setting up a DataLoader using the ETL methodology.

Primary LanguagePython

ETL - Extract / Translate / Load

Provides a basic directory structure and template files for setting up a DataLoader using the ETL methodology.

Installation

pip install git+https://gitlab.com/jayemar/etl.git

Basic Usage

from etl.dataloader import DataLoader
dl = DataLoader()

train_gen = dl.retrieve_data(<ml_cfg>)
test_gen = dl.get_test_data()
valid_gen = dl.get_validation_data()

Config File

The config file can be in either JSON or YAML format. Fields are optional unless otherwise stated.

Fields

  • data_dir: directory where data is located; path can be absolute or relative to directory of task.py
  • batch_size: number of records per batch
  • epochs: number of epochs to run through during training
  • train_size: decimal ratio of training data
  • test_size: decimal ratio of test data
  • valid_size: decimal ratio of validation data