DefTruth/CUDA-Learn-Notes

layer norm实现

zbt78 opened this issue · 3 comments

readme里面layer norm的实现是不是batch norm的啊

layer norm是per token处理的,batch norm是per channel的

嗷嗷明白了

This issue is stale because it has been open for 30 days with no activity.