FineGates: LLMs Finetuning with Compression using Stochastic Gates
Abstract
Large Language Models (LLMs), with billions of parameters, present significant challenges for full finetuning due to the high computational demands, memory requirements, and impracticality of many real-world applications. When faced with limited computational resources or small datasets, updating all model parameters can often result in overfitting. To address this, lightweight finetuning techniques have been proposed, like learning low-rank adapter layers. These methods aim to train only a few additional parameters combined with the base model, which remains frozen, reducing resource usage and mitigating overfitting risks. In this work, we propose an adaptor model based on stochastic gates that simultaneously sparsify the frozen base model with task-specific adaptation. Our method comes with a small number of trainable parameters and allows us to speed up the base model inference with competitive accuracy. We evaluate it in additional variants by equipping it with additional low-rank parameters and comparing it to several recent baselines. Our results show that the proposed method improves the finetuned model accuracy comparatively to the several baselines and allows the removal of up to 20-40\% without significant accuracy loss.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- GaLore+: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection (2024)
- QPruner: Probabilistic Decision Quantization for Structured Pruning in Large Language Models (2024)
- RandLoRA: Full-rank parameter-efficient fine-tuning of large models (2025)
- KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models (2024)
- You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning (2025)
- All-in-One Tuning and Structural Pruning for Domain-Specific LLMs (2024)
- TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper