thegregyang/NTK4A
Code for the paper: "Tensor Programs II: Neural Tangent Kernel for Any Architecture"
Jupyter Notebook
Issues
- 0
About transformer attention scaling
#3 opened by chenwydj - 2
Code for the paper: "Tensor Programs II: Neural Tangent Kernel for Any Architecture"
Jupyter Notebook