TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video Paper • 2411.18671 • Published Nov 27, 2024 • 20
LLM2CLIP Collection LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. • 10 items • Updated 5 days ago • 51
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images Paper • 2411.05738 • Published Nov 8, 2024 • 14
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25, 2024 • 61
Seeing Faces in Things: A Model and Dataset for Pareidolia Paper • 2409.16143 • Published Sep 24, 2024 • 17