Deep Equilibrium model is a kind of implicit layer model. These layers(implicit layers) have shown impressive results on NLP and vision tasks. One of the advantages of the implicit layer is memory efficiency which is based on implicit differentiation.
This graph shows the memories that were used in our DEQ model when we didn't implement implicit differentiation:
This graph shows the memories when we were implemented implicit differentiation:
Since these models are based on implicit layers we need a fixed point solver to find the fixed point that satisfies the desired condition. For fixed-point solvers, I used Anderson acceleration and forward solver which is a forward pass layer that satisfies a condition.