YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

ShortGPT

Unofficial implementations of:

To Use

  • Follow Llama 2 setup found here.
  • Reference short_gpt/short_llama.ipynb for necessary function calls.
  • For HuggingFace models, reference this branch.

Details

  • Use a wrapper around Llama to collect hidden states and compute BI (block influence).
    • BI implementation may be subject to change or improvements if others find issues, thanks in advance!
  • Sum importance values across layers while inferencing on pg19.
    • Dataset can be slow to load from huggingface so you may want to use an alternative.
  • Use sorted layer-wise importance values to determine which layers are least important and subject to removal.
  • Demonstrate model-healing with Mistral-7B-v0.1 described in "The Unreasonable Ineffectiveness of the Deeper Layers", where finetuning with LoRA after layer removal can recover downstream model performance.

Results

Comparison of ShortGPT layers removed on Llama-2-7B (9 least important layers):

Paper: [27, 26, 25, 28, 24, 29, 23, 21, 22]
This Implementation: [25, 27, 24, 26, 28, 29, 23, 22, 21]

Same layers but different order.

TODO:

  • Is order significant -> Authors mention that layer order varies between datasets but their relative ordering suggests "similar levels of importance" link.
  • Add more models and metrics -> Add experimental support for HF models on this branch.
    • Add angular distance metric
    • Demonstrate model healing using HuggingFace model here.

Citations

@misc{men2024shortgpt,
    title={ShortGPT: Layers in Large Language Models are More Redundant Than You Expect}, 
    author={Xin Men and Mingyu Xu and Qingyu Zhang and Bingning Wang and Hongyu Lin and Yaojie Lu and Xianpei Han and Weipeng Chen},
    year={2024},
    eprint={2403.03853},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

@misc{gromov2024unreasonable,
    title={The Unreasonable Ineffectiveness of the Deeper Layers}, 
    author={Andrey Gromov and Kushal Tirumala and Hassan Shapourian and Paolo Glorioso and Daniel A. Roberts},
    year={2024},
    eprint={2403.17887},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

@misc{song2024sleb,
    title={SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks}, 
    author={Jiwon Song and Kyungseok Oh and Taesu Kim and Hyungjun Kim and Yulhwa Kim and Jae-Joon Kim},
    year={2024},
    eprint={2402.09025},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

@article{raecompressive2019,
    author = {Rae, Jack W and Potapenko, Anna and Jayakumar, Siddhant M and Hillier, Chloe and Lillicrap, Timothy P},
    title = {Compressive Transformers for Long-Range Sequence Modelling},
    journal = {arXiv preprint},
    url = {https://arxiv.org/abs/1911.05507},
    year = {2019},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.