Submitted by zhoutianyi 38 CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing · 4 authors 7
Submitted by sinwang 28 World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning · 7 authors 5
Submitted by agwmon 28 Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models · 5 authors 1
Submitted by Owen777 25 CoRe^2: Collect, Reflect and Refine to Generate Better and Faster · 7 authors 3
Submitted by LucasFang 21 GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing · 12 authors 1
Submitted by wondervictor 15 GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding · 10 authors 1
Submitted by Weiyun1025 15 VisualPRM: An Effective Process Reward Model for Multimodal Reasoning · 15 authors 1
Submitted by ChenyangLyu 14 New Trends for Modern Machine Translation with Large Reasoning Models · 6 authors 1
Submitted by wenhu 10 VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search · 7 authors 1
Submitted by yyf86 8 DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation · 9 authors 1
Submitted by EthanTaylor 8 4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models · 8 authors 1
Submitted by VityaVitalich 8 Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark · 6 authors 1
Submitted by akhaliq 8 Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k · 32 authors 1
Submitted by akhaliq 7 Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond · 14 authors 1
Submitted by sayakpaul 7 SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation · 9 authors 2
Submitted by yeates 7 OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting · 4 authors 1
Submitted by akhaliq 5 R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization · 12 authors 1
Submitted by BestWishYsh 5 CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance · 10 authors 1
Submitted by allisonandreyev 5 Quantization for OpenAI's Whisper Models: A Comparative Analysis · 1 authors 1
Submitted by hp-l33 4 Autoregressive Image Generation with Randomized Parallel Decoding · 4 authors 1
Submitted by hkchengrex 3 The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation · 2 authors 1
Submitted by imranraad 2 "Silent Is Not Actually Silent": An Investigation of Toxicity on Bug Report Discussion · 2 authors 1
Submitted by chenblin26 1 ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer · 6 authors 1
Submitted by gabrielchua 1 MinorBench: A hand-built benchmark for content-based risks for children · 3 authors 1
Submitted by AhmadMustafa 1 On the Limitations of Vision-Language Models in Understanding Image Transforms · 3 authors 1
Submitted by Nikolai10 1 PerCoV2: Improved Ultra-Low Bit-Rate Perceptual Image Compression with Implicit Hierarchical Masked Image Modeling · 6 authors 1
Submitted by jhao - TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention · 9 authors 1