NALU

Implementation of Neural Arithmetic Logic Units as discussed in https://arxiv.org/abs/1808.00508

This implementation

The implementation here deviates from the paper when it comes to computing the gate variable g
The paper enforces a dependence of g on the input x with the equation:
However for most purposes the gating function is only dependant upon the task and not the input
and can be learnt independantly of the input.
This implementation uses where G is a learnt scalar.

For recurrent tasks, however, it does make sense to condition the gate value on the input.

Limitations of a single cell NALU

Can handle either add/subtract or mult/div operations but not a combination of both.
For mult/div operations, it cannot handle negative targets as the mult/div gate output
is the result of an exponentiation operation which always yeilds positive results.
Power operations are only possible when the exponent is in the range of [0, 1].

Advantages of using NALU

The careful design of the mathematics ensure the learnt weights allow for both
interpolation and extrapolation.

Note

Power operations above the range of [0, 1] would need 2 NALU's stacked on top of each other.
The hidden dimentionality of the stacked NALU network will have to be greater than or equal to the exponent.

dshahid380/NALU

NALU

This implementation

Limitations of a single cell NALU

Advantages of using NALU

Note