/gpt-variations

Code for the paper - "Towards smaller, faster decoder-only transformers: Architectural variants and their implications"

Primary LanguagePythonApache License 2.0Apache-2.0

Stargazers