[feat] data augmentation

Question

[feat] data augmentation

Closed this issue 8 months ago · 2 comments

We should be able to augment data (as opposed to noise data) as a way to allow for constructing better datasets (should improve training vs decrease training power).

Data augmentation should only happen on the train set
Data augmentation should happen after noise and after data has been joined (in the event of column splitting)

add an add_augmentation method to the csv class in python that adds data (duplicates).
add augmentation methods in augmentation/augmentation_generators.py
check if the experiment class can handle data augmentation
write tests for both experiment and augmentation classes
allow json schemer to handle user input augmentation method

Answer 1 · 2024-05-03T08:31:46.000Z

Python part is completed in #96

Answer 2 · 2024-05-06T11:56:10.000Z

nextflow part is completed in #118