upskyy/Squeezeformer

The scaling parameter is not learned?

Opened this issue · 0 comments

The paper claims the scaling parameter is learned, but your implementation is a fixed scalar.