Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published 6 days ago • 103
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching Paper • 2503.05179 • Published 6 days ago • 42
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Paper • 2503.05592 • Published 6 days ago • 24
Forgetting Transformer: Softmax Attention with a Forget Gate Paper • 2503.02130 • Published 10 days ago • 26
SafeArena: Evaluating the Safety of Autonomous Web Agents Paper • 2503.04957 • Published 7 days ago • 18
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control Paper • 2503.05639 • Published 6 days ago • 19
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Paper • 2503.05379 • Published 6 days ago • 24
Learning from Failures in Multi-Attempt Reinforcement Learning Paper • 2503.04808 • Published 9 days ago • 16
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation Paper • 2503.04872 • Published 7 days ago • 14
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models Paper • 2503.05638 • Published 6 days ago • 16
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities Paper • 2503.05652 • Published 6 days ago • 10
An Empirical Study on Eliciting and Improving R1-like Reasoning Models Paper • 2503.04548 • Published 7 days ago • 8
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts Paper • 2503.05447 • Published 6 days ago • 7
LONGCODEU: Benchmarking Long-Context Language Models on Long Code Understanding Paper • 2503.04359 • Published 7 days ago • 6
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test Paper • 2503.01840 • Published 10 days ago • 4
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles Paper • 2502.18968 • Published 15 days ago • 3
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM Paper • 2503.04504 • Published 7 days ago • 2
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published 2 days ago • 52
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories Paper • 2503.08625 • Published 2 days ago • 22
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model Paper • 2503.07703 • Published 3 days ago • 28
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published about 20 hours ago • 18
Multimodal Language Modeling for High-Accuracy Single Cell Transcriptomics Analysis and Generation Paper • 2503.09427 • Published about 23 hours ago • 1
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Paper • 2503.08619 • Published 2 days ago • 16
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models Paper • 2503.08686 • Published 2 days ago • 14
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval Paper • 2503.08644 • Published 2 days ago • 15
Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru Paper • 2503.07587 • Published 3 days ago • 9
^RFLAV: Rolling Flow matching for infinite Audio Video generation Paper • 2503.08307 • Published 2 days ago • 8
BiasEdit: Debiasing Stereotyped Language Models via Model Editing Paper • 2503.08588 • Published 2 days ago • 6
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models Paper • 2503.08417 • Published 2 days ago • 6
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol Paper • 2503.05860 • Published 6 days ago • 6
Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents Paper • 2503.08684 • Published 2 days ago • 5