VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering Paper • 2503.06492 • Published 4 days ago • 8
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering Paper • 2503.06492 • Published 4 days ago • 8 • 4
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering Paper • 2503.06492 • Published 4 days ago • 8 • 4
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published 1 day ago • 51
Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models Paper • 2503.01763 • Published 10 days ago • 4 • 2
FLAME: A Federated Learning Benchmark for Robotic Manipulation Paper • 2503.01729 • Published 10 days ago • 4 • 2
Benchmarking Large Language Models for Multi-Language Software Vulnerability Detection Paper • 2503.01449 • Published 10 days ago • 4 • 2
CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs Paper • 2503.01378 • Published 10 days ago • 3 • 2
SwiLTra-Bench: The Swiss Legal Translation Benchmark Paper • 2503.01372 • Published 10 days ago • 3 • 2
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention Paper • 2502.14866 • Published 21 days ago • 12 • 2
SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation Paper • 2502.08168 • Published 29 days ago • 11
Expect the Unexpected: FailSafe Long Context QA for Finance Paper • 2502.06329 • Published about 1 month ago • 126 • 4
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published about 1 month ago • 60 • 6
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published about 1 month ago • 60
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published about 1 month ago • 60 • 6