Share weights of embedding layer with output layer in `CausalLanguageModel`

Question

krasserm opened this issue 2 years ago · 0 comments