VLMs - a hg2wzh Collection

hg2wzh 's Collections

Embed

VLMs

LLMs

VLMs

updated 1 day ago

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 76
Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published Dec 11, 2024 • 44
AIDC-AI/Ovis2-2B

Image-Text-to-Text • Updated 14 days ago • 5.91k • 52
DAMO-NLP-SG/VideoLLaMA3-2B

Visual Question Answering • Updated about 24 hours ago • 5.67k • 10
AIDC-AI/Ovis2-16B

Image-Text-to-Text • Updated 14 days ago • 3.18k • 82
microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • Updated about 12 hours ago • 441k • 1.12k
StarJiaxing/R1-Omni-0.5B

Updated 3 days ago • 70 • 30