Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts Paper • 2503.05066 • Published 6 days ago • 3
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 345
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model Paper • 2405.04434 • Published May 7, 2024 • 18
LLM-Drop Collection Model weights of paper "What Matters in Transformers? Not All Attention is Needed" (https://arxiv.org/abs/2406.15786) • 14 items • Updated Oct 23, 2024 • 4
What Matters in Transformers? Not All Attention is Needed Paper • 2406.15786 • Published Jun 22, 2024 • 31