TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space Paper • 2501.12224 • Published 6 days ago • 46
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise Paper • 2501.08331 • Published 13 days ago • 17
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos Paper • 2501.12375 • Published 6 days ago • 18
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published 6 days ago • 45
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Paper • 2501.10893 • Published 9 days ago • 22
GameFactory: Creating New Games with Generative Interactive Videos Paper • 2501.08325 • Published 13 days ago • 59
CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation Paper • 2501.09433 • Published 11 days ago • 17
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces Paper • 2501.09756 • Published 11 days ago • 18
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors Paper • 2501.08225 • Published 13 days ago • 18
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published 20 days ago • 42
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published 18 days ago • 85
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published 19 days ago • 249
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation Paper • 2204.12484 • Published Apr 26, 2022 • 2
TransPixar: Advancing Text-to-Video Generation with Transparency Paper • 2501.03006 • Published 21 days ago • 23
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Paper • 2501.03847 • Published 20 days ago • 23
PERSE: Personalized 3D Generative Avatars from A Single Portrait Paper • 2412.21206 • Published 28 days ago • 17
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper • 2412.21037 • Published 28 days ago • 23
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper • 2412.15322 • Published Dec 19, 2024 • 18