/MiniMA

Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"

Primary LanguagePythonApache License 2.0Apache-2.0

Watchers