microsoft/mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
Jupyter NotebookMIT
Issues
- 0
How to use mu-transfer for LLaMA2?
#4 opened by brando90 - 0
- 0
- 0
bert256.bsh not in repo
#1 opened by nightsnack
some common Huggingface transformers in maximal update parametrization (µP)
Jupyter NotebookMIT