Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published 19 days ago • 66
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading Paper • 2502.12574 • Published 20 days ago • 11
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published about 1 month ago • 122