Identifying Sensitive Weights via Post-quantization Integral Paper • 2503.01901 • Published 10 days ago • 7
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Paper • 2503.03983 • Published 5 days ago • 22
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 4 days ago • 70
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published 4 days ago • 16
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids Paper • 2502.20396 • Published 11 days ago • 12
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Paper • 2502.19328 • Published 12 days ago • 21
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published 19 days ago • 66
Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation Paper • 2502.16707 • Published 15 days ago • 11
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks Paper • 2502.17157 • Published 14 days ago • 51
Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations Paper • 2501.19066 • Published Jan 31 • 12
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4 • 61