ShortGPT

Unofficial implementations of:

To Use

Follow Llama 2 setup found here.
Reference short_gpt/short_llama.ipynb for necessary function calls.
For HuggingFace models, reference this branch.

Details

Use a wrapper around Llama to collect hidden states and compute BI (block influence).
- BI implementation may be subject to change or improvements if others find issues, thanks in advance!
Sum importance values across layers while inferencing on pg19.
- Dataset can be slow to load from huggingface so you may want to use an alternative.
Use sorted layer-wise importance values to determine which layers are least important and subject to removal.
Demonstrate model-healing with Mistral-7B-v0.1 described in "The Unreasonable Ineffectiveness of the Deeper Layers", where finetuning with LoRA after layer removal can recover downstream model performance.

Results

Comparison of ShortGPT layers removed on Llama-2-7B (9 least important layers):

Paper: [27, 26, 25, 28, 24, 29, 23, 21, 22]
This Implementation: [25, 27, 24, 26, 28, 29, 23, 22, 21]

Same layers but different order.

TODO:

Is order significant -> Authors mention that layer order varies between datasets but their relative ordering suggests "similar levels of importance" link.
Add more models and metrics -> Add experimental support for HF models on this branch.
- Add angular distance metric
- Demonstrate model healing using HuggingFace model here.

Citations