Forgetting Transformer: Softmax Attention with a Forget Gate Paper • 2503.02130 • Published 7 days ago • 15