/NALU

Neural Arithmetic Logic Units

Primary LanguageJupyter Notebook

NALU

Implementation of Neural Arithmetic Logic Units as discussed in https://arxiv.org/abs/1808.00508

This implementation

The implementation here deviates from the paper when it comes to computing the gate variable g
The paper enforces a dependence of g on the input x with the equation: equation
However for most purposes the gating function is only dependant upon the task and not the input
and can be learnt independantly of the input.
This implementation uses equation where G is a learnt scalar.

For recurrent tasks, however, it does make sense to condition the gate value on the input.

Limitations of a single cell NALU

  • Can handle either add/subtract or mult/div operations but not a combination of both.
  • For mult/div operations, it cannot handle negative targets as the mult/div gate output
    is the result of an exponentiation operation which always yeilds positive results.
  • Power operations are only possible when the exponent is in the range of [0, 1].

Advantages of using NALU

  • The careful design of the mathematics ensure the learnt weights allow for both
    interpolation and extrapolation.

Note

Power operations above the range of [0, 1] would need 2 NALU's stacked on top of each other.
The hidden dimentionality of the stacked NALU network will have to be greater than or equal to the exponent.