license: mit
datasets:
- glue
language:
- en
metrics:
- accuracy
- f1
- spearmanr
- pearsonr
- matthews_correlation
base_model: google-bert/bert-base-uncased
pipeline_tag: text-classification
tags:
- adapter
- low-rank
- fine-tuning
- LoRA
- DiffLoRA
eval_results: Refer to GLUE experiments in the examples folder
view_doc: https://huggingface.co/nozomuteruyo14/Diff_LoRA
Model Card for DiffLoRA
DiffLoRA is an innovative adapter architecture that extends conventional low-rank adaptation (LoRA) by fine-tuning a pre-trained large-scale model using differential low-rank matrices. Instead of updating all model parameters, DiffLoRA updates only a small set of low-rank matrices, which allows for efficient fine-tuning with reduced trainable parameters.
Model Details
Model Description
DiffLoRA is an original method developed by the author and is inspired by the conceptual ideas from the Differential Transformer paper (https://arxiv.org/abs/2410.05258). It decomposes the weight update into two componentsβpositive and negative contributionsβenabling a more fine-grained adjustment than traditional LoRA. The output of a single layer is computed as:
where:
is the input vector (or each sample in a batch).
is the fixed pre-trained weight matrix.
is the differential update computed as:
with:
being the input after dropout (or another regularization).
capturing the positive contribution.
capturing the negative contribution.
is a learnable scalar that balances the two contributions.
is a scaling factor.
is the chosen rank.
For computational efficiency, the two low-rank components are fused via concatenation:
The update is then calculated as:
resulting in the final output:
- Developed by: Nozomu Fujisawa in Kondo Lab
- Model type: Differential Low-Rank Adapter (DiffLoRA)
- Language(s) (NLP): en
- License: MIT
- Finetuned from model [optional]: bert-base-uncased
Model Sources
- Repository: https://huggingface.co/nozomuteruyo14/Diff_LoRA
- Paper: DiffLoRA is inspired by ideas from the Differential Transformer (https://arxiv.org/abs/2410.05258), but it is an original method developed by the author.
Uses
Direct Use
DiffLoRA is intended to be integrated as an adapter module into pre-trained transformer models. It allows efficient fine-tuning by updating only a small number of low-rank parameters, making it ideal for scenarios where computational resources are limited.
Out-of-Scope Use
DiffLoRA is not designed for training models from scratch, nor is it recommended for tasks where full parameter updates are necessary. It is optimized for transformer-based NLP tasks and may not generalize well to non-NLP domains. Also, there are only a limited number of base models that can be used.
Bias, Risks, and Limitations
While DiffLoRA offers a parameter-efficient fine-tuning approach, it inherits limitations from its base models (e.g., BERT, MiniLM). It may not capture all domain-specific nuances when only a limited number of parameters are updated. Users should carefully evaluate performance and consider potential biases in their applications.
Recommendations
Users should:
- Experiment with different rank r and scaling factor Ξ± values.
- Compare DiffLoRA with other adapter techniques.
- Be cautious about over-relying on the adapter when full model adaptation might be necessary.
How to Get Started with the Model
To integrate DiffLoRA into your fine-tuning workflow, check the example script in the examples/run_glue_experiment.py
file.
Training Details
Training Data
This implementation has been demonstrated on GLUE tasks using the Hugging Face Datasets library.
Training Procedure
DiffLoRA is applied by freezing the base model weights and updating only the low-rank adapter parameters. The procedure involves:
- Preprocessing text inputs (concatenating multiple text columns if necessary).
- Injecting DiffLoRA adapters into target linear layers.
- Fine-tuning on a downstream task while the base model remains frozen.
Training Hyperparameters
- Training regime: Fine-tuning with frozen base weights; only adapter parameters are updated.
- Learning rate: 2e-5 (example)
- Batch size: 32 per device
- Epochs: 3 (example)
- Optimizer: AdamW with weight decay
Evaluation
Testing Data, Factors & Metrics
Testing Data
GLUE validation sets are used for evaluation.
Factors
Evaluations are performed across multiple GLUE tasks to ensure comprehensive performance analysis.
Metrics
Evaluation metrics include accuracy, F1 score, Pearson correlation, and Spearman correlation, depending on the task.
Results
For detailed evaluation results, please refer to the GLUE experiment script in the examples
directory.
Summary
DiffLoRA achieves faster convergence and competitive performance on GLUE tasks compared to other parameter-efficient fine-tuning methods.
Citation
paper: Writing
Model Card Contact
For any questions regarding this model card, please contact: [[email protected]]