HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published 19 days ago • 89
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published 17 days ago • 78
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published 11 days ago • 46
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 11 days ago • 92
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching Paper • 2412.17153 • Published 21 days ago • 34
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published 21 days ago • 29
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published 21 days ago • 42
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published 21 days ago • 45
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response Paper • 2412.14922 • Published 25 days ago • 85
In Case You Missed It: ARC 'Challenge' Is Not That Challenging Paper • 2412.17758 • Published 21 days ago • 16
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding Paper • 2412.18450 • Published 20 days ago • 32
Large Action Models: From Inception to Implementation Paper • 2412.10047 • Published about 1 month ago • 32
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published about 1 month ago • 137
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 86