ablation studies about layer normalization and additional linear projection layer?

Question

ablation studies about layer normalization and additional linear projection layer?

Closed this issue 4 years ago · 2 comments

Hi, PointsCoder, thanks for open source the project. as you say, your mainly difference with transformer is batch normalization and linear projection layer, So I just wonder how much improvement your transformer compared with origin transformer? As I didn't see the ablation studies about that.

Answer 1 · 2021-09-10T06:56:09.000Z

@AndyYuan96 LayerNorm performs at a comparable level with BatchNorm.
An additional Projection Layer is necessary if you want to increase channels when the net goes deeper.
Our main contribution is the efficient computation of the attention mechanisms on sparse voxels, while vanilla transformers cannot do this efficiently or are even computationally infeasible. We didn't have many modifications on other parts.

Answer 2 · 2021-09-10T07:13:57.000Z

@AndyYuan96 LayerNorm performs at a comparable level with BatchNorm.
An additional Projection Layer is necessary if you want to increase channels when the net goes deeper.
Our main contribution is the efficient computation of the attention mechanisms on sparse voxels, while vanilla transformers cannot do this efficiently or are even computationally infeasible. We didn't have many modifications on other parts.

I got, thank you.