-
GLaMM: Pixel Grounding Large Multimodal Model
Paper • 2311.03356 • Published • 33 -
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Paper • 2311.07575 • Published • 13 -
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
Paper • 2311.03354 • Published • 4 -
Language-Informed Visual Concept Learning
Paper • 2312.03587 • Published • 5
Collections
Discover the best community collections!
Collections including paper arxiv:2401.02957
-
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Paper • 2312.04557 • Published • 12 -
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper • 2312.04410 • Published • 14 -
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper • 2312.04461 • Published • 61 -
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
Paper • 2401.02955 • Published • 21
-
Learning Vision from Models Rivals Learning Vision from Data
Paper • 2312.17742 • Published • 15 -
Unsupervised Universal Image Segmentation
Paper • 2312.17243 • Published • 19 -
Perspectives on the State and Future of Deep Learning -- 2023
Paper • 2312.09323 • Published • 5 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 11
-
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper • 2312.07987 • Published • 41 -
Interfacing Foundation Models' Embeddings
Paper • 2312.07532 • Published • 10 -
Point Transformer V3: Simpler, Faster, Stronger
Paper • 2312.10035 • Published • 17 -
TheBloke/quantum-v0.01-GPTQ
Text Generation • Updated • 18 • 2
-
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 117 -
stabilityai/stable-video-diffusion-img2vid-xt
Image-to-Video • Updated • 188k • 2.8k -
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper • 2311.13384 • Published • 50 -
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Paper • 2311.12454 • Published • 30
-
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
Paper • 2308.16582 • Published • 10 -
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation
Paper • 2310.13119 • Published • 11 -
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper • 2310.16818 • Published • 30 -
Text-to-3D with classifier score distillation
Paper • 2310.19415 • Published • 4