Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"
Primary LanguagePythonApache License 2.0Apache-2.0