CapsFusion: Rethinking Image-Text Data at Scale Paper • 2310.20550 • Published Oct 31, 2023 • 26
Generative Multimodal Models are In-Context Learners Paper • 2312.13286 • Published Dec 20, 2023 • 35
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters Paper • 2402.04252 • Published Feb 6, 2024 • 26
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception Paper • 2407.08303 • Published Jul 11, 2024 • 17