Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
Abstract
Large pre-trained models have achieved outstanding results in sequence modeling. The Transformer block and its attention mechanism have been the main drivers of the success of these models. Recently, alternative architectures, such as Selective Structured State Space Models (SSMs), have been proposed to address the inefficiencies of Transformers. This paper explores the compression of SSM-based models, particularly Mamba and its hybrids. We study the sensitivity of these models to the removal of selected components at different granularities to reduce the model size and computational overhead, thus improving their efficiency while maintaining accuracy. The proposed solutions, collectively referred to as Mamba-Shedder, achieve a speedup of up to 1.4x during inference, demonstrating that model efficiency can be improved by eliminating several redundancies with minimal impact on the overall model performance. The code is available at https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- On Pruning State-Space LLMs (2025)
- Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing (2025)
- MultiPruner: Balanced Structure Removal in Foundation Models (2025)
- DeltaLLM: Compress LLMs with Low-Rank Deltas between Shared Weights (2025)
- DReSS: Data-driven Regularized Structured Streamlining for Large Language Models (2025)
- TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba (2025)
- Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper