/mixup

An R package inspired by 'mixup: Beyond Empirical Risk Minimization'

Primary LanguageRGNU General Public License v2.0GPL-2.0

mixup

Lifecycle R %>%= 3.2.0 Dependencies

mixup is an R package for data-augmentation inspired by mixup: Beyond Empirical Risk Minimization

If you like mixup, give it a star, or fork it and contribute!

Usage

Create additional training data for toy dataset:

library(mixup)

# Use builtin mtcars dataset with mtcars$am (automatic/manual) as binary target
data(mtcars)
str(mtcars)
summary(mtcars[, -9])
summary(mtcars$am)

# Strictly speaking this is 'input mixup' (see Details section below)
set.seed(42)
mtcars.mix <- mixup(mtcars[, -9], mtcars$am)
summary(mtcars.mix$x)
summary(mtcars.mix$y)

# Further info
?mixup

Installation

Requires R version 3.2.0 and higher.

install.packages('devtools') # Install devtools package if necessary
library(devtools)
devtools::install_github('makeyourownmaker/mixup')

Details

The mixup function enlarges training sets using linear interpolations of features and associated labels as described in https://arxiv.org/abs/1710.09412.

Virtual feature-target pairs are produced from randomly drawn feature-target pairs in the training data.
The method is straight-forward and data-agnostic. It should result in a reduction of generalisation error.

Mixup constructs additional training examples:

x' = λ * x_i + (1 - λ) * x_j, where x_i, x_j are raw input vectors

y' = λ * y_i + (1 - λ) * y_j, where y_i, y_j are one-hot label encodings

(x_i, y_i) and (x_j ,y_j) are two examples drawn at random from the training data, and λ ∈ [0, 1] with λ ∼ Beta(α, α) for α ∈ (0, ∞). The mixup hyper-parameter α controls the strength of interpolation between feature-target pairs.

mixup() parameters

Parameter Description Notes
x1 Original features Required parameter
y1 Original labels Required parameter
alpha Hyperparameter specifying strength of interpolation Defaults to 1
concat Concatenate mixup data with original data Defaults to FALSE
batch_size How many mixup values to produce Defaults to number of examples

The x1 and y1 parameters must be numeric and must have equal numbers of examples. Non-finite values are not permitted. Factors should be one-hot encoded.

For now, only binary classification is supported. Meaning y1 must contain only numeric 0 and 1 values.

Alpha values must be greater than or equal to zero. Alpha equal to zero specifies no interpolation.

The mixup function returns a two-element list containing interpolated x and y values. Optionally, the original values can be concatenated with the new values.

Mixup with other learning methods

It is worthwhile distinguishing between mixup usage with deep learning and other learning methods. Mixup with deep learning can improve generalisation when a new mixed dataset is generated every epoch or even better for every minibatch. This level of granularity may not be possible with other learning methods. For example, simple linear modeling may not benefit much from training on a single (potentially greatly expanded) pre-mixed dataset. This single pre-mixed dataset approach is sometimes referred to as 'input mixup'.

In certain situations, under-fitting can occur when conflicts between synthetic labels of the mixed-up examples and labels of the original training data are present. Some learning methods may be more prone to this under-fitting than others.

Data augmentation as regularisation

Data augmentation is occasionally referred to as a regularisation technique. Regularisation decreases a model's variance by adding prior knowledge (sometimes using shrinkage). Increasing training data (using augmentation) also decreases a model's variance. Data augmentation is also a form of adding prior knowledge to a model.

Citing

If you use mixup in a scientific publication, then consider citing the original paper:

mixup: Beyond Empirical Risk Minimization

By Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz

https://arxiv.org/abs/1710.09412

I have no affiliation with MIT, FAIR or any of the authors.

Roadmap

  • Improve docs
    • Add more detailed examples
      • Different data types e.g. tabular, image etc
      • Different parameters
      • Different learning methods
  • Add my time series mixup variant
  • Lint package with goodpractice
  • Add tests
  • Add support for one-hot encoded labels
  • Add label preserving option
  • Add support for mixing within the same class
    • Usually doesn't perform as well as mixing within all classes
    • May still have some utility e.g. unbalanced data sets
  • Generalise to regression problems

Alternatives

Other implementations:

See Also

Discussion:

Closely related research:

Loosely related research:

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

GPL-2