Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper β’ 2501.04001 β’ Published 5 days ago β’ 38
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper β’ 2501.05441 β’ Published 3 days ago β’ 60
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published 5 days ago β’ 194
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation Paper β’ 2204.12484 β’ Published Apr 26, 2022 β’ 2
TransPixar: Advancing Text-to-Video Generation with Transparency Paper β’ 2501.03006 β’ Published 7 days ago β’ 20
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Paper β’ 2501.03847 β’ Published 6 days ago β’ 18