NALU

Implementation of Neural Arithmetic Logic Units as discussed in https://arxiv.org/abs/1808.00508

This implementation

The implementation here deviates from the paper when it comes to computing the gate variable g
The paper enforces a dependence of g on the input x with the equation:
However for most purposes the gating function is only dependant upon the task and not the input
and can be learnt independantly of the input.
This implementation uses where G is a learnt scalar.

For recurrent tasks, however, it does make sense to condition the gate value on the input.

Limitations of a single NALU

Can handle either add/subtract or mult/div operations but not a combination of both.
For mult/div operations, it cannot handle negative targets as the mult/div gate output
is the result of an exponentiation operation which always yeilds positive results.
Power operations are only possible when the exponent is in the range of [0, 1].

Advantages of using NALU

The careful design of the mathematics ensure the learnt weights allow for both
interpolation and extrapolation.

MacOS/NALU-2

NALU

This implementation

Limitations of a single NALU

Advantages of using NALU