Breaking the Low-Rank Dilemma of Linear Attention: RAVLT Model Card

This model card describes the Rank-Augmented Vision Linear Transformer (RAVLT), introduced in the paper "Breaking the Low-Rank Dilemma of Linear Attention". RAVLT achieves state-of-the-art performance on ImageNet-1k classification while maintaining linear complexity.

Key Features:

  • High accuracy: Achieves 84.4% Top-1 accuracy on ImageNet-1k (RAVLT-S).
  • Parameter efficiency: Uses only 26M parameters (RAVLT-S).
  • Computational efficiency: Achieves 4.6G FLOPs (RAVLT-S).
  • Linear complexity.

RAVLT is based on Rank-Augmented Linear Attention (RALA), a novel attention mechanism that addresses the low-rank limitations of standard linear attention.

Model Variants

Several RAVLT variants were trained, offering different tradeoffs between accuracy, parameters, and FLOPs:

Model Params (M) FLOPs (G) Checkpoint
RAVLT-T 15 2.4 RAVLT-T
RAVLT-S 26 4.6 RAVLT-S
RAVLT-B 48 9.9 RAVLT-B
RAVLT-L 95 16.0 RAVLT-L

How to use (Placeholder - Awaiting Code Release)

Instructions on how to use the model will be provided once the code repository is available. Code will be available at https://github.com/qhfan/RALA.

Citation

@inproceedings{fan2024breakinglowrank,
      title={Breaking the Low-Rank Dilemma of Linear Attention},
      author={Qihang Fan and Huaibo Huang and Ran He },
      year={2025},
      booktitle={CVPR},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.