mishig
's Collections
fuck quadratic attention
updated
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper
β’
2404.05892
β’
Published
β’
33
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
β’
2312.00752
β’
Published
β’
140
RecurrentGemma: Moving Past Transformers for Efficient Open Language
Models
Paper
β’
2404.07839
β’
Published
β’
44
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
β’
2404.07143
β’
Published
β’
106
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
β’
2404.08801
β’
Published
β’
65
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
β’
2402.19427
β’
Published
β’
53
Transformers are RNNs: Fast Autoregressive Transformers with Linear
Attention
Paper
β’
2006.16236
β’
Published
β’
3
Scaling Transformer to 1M tokens and beyond with RMT
Paper
β’
2304.11062
β’
Published
β’
2
CoLT5: Faster Long-Range Transformers with Conditional Computation
Paper
β’
2303.09752
β’
Published
β’
2
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax
Mimicry
Paper
β’
2402.04347
β’
Published
β’
14
The Illusion of State in State-Space Models
Paper
β’
2404.08819
β’
Published
β’
1