The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input Paper • 2501.03200 • Published 6 days ago • 1
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection Paper • 2501.04575 • Published 5 days ago • 21
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published 5 days ago • 68
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published 6 days ago • 55
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution Paper • 2501.02976 • Published 7 days ago • 46
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published 9 days ago • 35
MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published 13 days ago • 23
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published 11 days ago • 45
Training Software Engineering Agents and Verifiers with SWE-Gym Paper • 2412.21139 • Published 13 days ago • 20
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper • 2412.21037 • Published 14 days ago • 23
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published 20 days ago • 66
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models Paper • 2412.18605 • Published 19 days ago • 20
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models Paper • 2412.18609 • Published 19 days ago • 15
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published 20 days ago • 35
Re-assessing ImageNet: How aligned is its single-label assumption with its multi-label nature? Paper • 2412.18409 • Published 20 days ago • 1
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks Paper • 2412.15204 • Published 24 days ago • 33