Implementation of Neural Arithmetic Logic Units as discussed in https://arxiv.org/abs/1808.00508
The implementation here deviates from the paper when it comes to computing the gate variable g
The paper enforces a dependence of g on the input x with the equation:
However for most purposes the gating function is only dependant upon the task and not the input
and can be learnt independantly of the input.
This implementation uses where G is a learnt scalar.
For recurrent tasks, however, it does make sense to condition the gate value on the input.
- Can handle either add/subtract or mult/div operations but not a combination of both.
- For mult/div operations, it cannot handle negative targets as the mult/div gate output
is the result of an exponentiation operation which always yeilds positive results. - Power operations are only possible when the exponent is in the range of [0, 1].
- The careful design of the mathematics ensure the learnt weights allow for both
interpolation and extrapolation.
Power operations above the range of [0, 1] would need 2 NALU's stacked on top of each other.
The hidden dimentionality of the stacked NALU network will have to be greater than or equal to the exponent.