An implementation of Mixture of Layers (MoL) in PyTorch. We propose a method for neural networks to route information dynamically through their layers in an arbitrary order, allowing for in-context parameter tying.
The core of MoL is LayerRouter, a module that determines which layer the antecedent layer's activations should be forwarded through. Formally, LayerRouter is a function
where