CLIP-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection Paper • 2311.00453 • Published Nov 1, 2023
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models Paper • 2410.16236 • Published Oct 21, 2024
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network Paper • 2411.15941 • Published Nov 24, 2024 • 1
MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection Paper • 2404.06564 • Published Apr 9, 2024
TASAR: Transfer-based Attack on Skeletal Action Recognition Paper • 2409.02483 • Published Sep 4, 2024 • 4
3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering Paper • 2501.05131 • Published Jan 9 • 34
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 100
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing Paper • 2402.13185 • Published Feb 20, 2024
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper • 2412.07760 • Published Dec 10, 2024 • 50
Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow Paper • 2306.07209 • Published Jun 12, 2023 • 2
Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives Paper • 2401.02009 • Published Jan 4, 2024 • 1
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model Paper • 2407.07053 • Published Jul 9, 2024 • 45