Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 2 days ago • 78
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge Paper • 2411.19799 • Published Nov 29, 2024 • 12
Multi-Level Knowledge Distillation for Out-of-Distribution Detection in Text Paper • 2211.11300 • Published Nov 21, 2022 • 1
Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning Paper • 2503.07002 • Published 3 days ago • 36
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models Paper • 2502.06788 • Published about 1 month ago • 12
Taming Teacher Forcing for Masked Autoregressive Video Generation Paper • 2501.12389 • Published Jan 21 • 10
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario Paper • 2501.10132 • Published Jan 17 • 19
Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published Dec 18, 2024 • 14
Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published Dec 18, 2024 • 14
Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published Dec 18, 2024 • 14
Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published Dec 18, 2024 • 14
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published Dec 19, 2024 • 55
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published Dec 19, 2024 • 55
GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation Paper • 2311.17971 • Published Nov 29, 2023
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding Paper • 2409.14485 • Published Sep 22, 2024
Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published Dec 18, 2024 • 14
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published Dec 19, 2024 • 55
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale Paper • 2412.06699 • Published Dec 9, 2024 • 13
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models Paper • 2412.04146 • Published Dec 5, 2024 • 23
MagicQuill: An Intelligent Interactive Image Editing System Paper • 2411.09703 • Published Nov 14, 2024 • 68