AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models Paper • 2406.13233 • Published Jun 19, 2024 • 1
SIFT: Grounding LLM Reasoning in Contexts via Stickers Paper • 2502.14922 • Published 19 days ago • 30
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection Paper • 2410.14731 • Published Oct 16, 2024
SIFT: Grounding LLM Reasoning in Contexts via Stickers Paper • 2502.14922 • Published 19 days ago • 30
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation Paper • 2502.05415 • Published about 1 month ago • 22