Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2401.02957

Localize Viusal Understanding

GLaMM: Pixel Grounding Large Multimodal Model

Paper • 2311.03356 • Published Nov 6, 2023 • 33
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

Paper • 2311.07575 • Published Nov 13, 2023 • 13
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

Paper • 2311.03354 • Published Nov 6, 2023 • 4
Language-Informed Visual Concept Learning

Paper • 2312.03587 • Published Dec 6, 2023 • 5

Vision Transformer

Denoising Vision Transformers

Paper • 2401.02957 • Published Jan 5, 2024 • 28

Denoising Vision Transformers

Paper • 2401.02957 • Published Jan 5, 2024 • 28

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation

Paper • 2312.04557 • Published Dec 7, 2023 • 12
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Paper • 2312.04410 • Published Dec 7, 2023 • 14
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Paper • 2312.04461 • Published Dec 7, 2023 • 61
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

Paper • 2401.02955 • Published Jan 5, 2024 • 21

Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4, 2024 • 62
Denoising Vision Transformers

Paper • 2401.02957 • Published Jan 5, 2024 • 28
H2O-Danube-1.8B Technical Report

Paper • 2401.16818 • Published Jan 30, 2024 • 17

Saved trending papers

Learning Vision from Models Rivals Learning Vision from Data

Paper • 2312.17742 • Published Dec 28, 2023 • 15
Unsupervised Universal Image Segmentation

Paper • 2312.17243 • Published Dec 28, 2023 • 19
Perspectives on the State and Future of Deep Learning -- 2023

Paper • 2312.09323 • Published Dec 7, 2023 • 5
Vision-Language Models as a Source of Rewards

Paper • 2312.09187 • Published Dec 14, 2023 • 11

Computer vision

Unsupervised Universal Image Segmentation

Paper • 2312.17243 • Published Dec 28, 2023 • 19
Denoising Vision Transformers

Paper • 2401.02957 • Published Jan 5, 2024 • 28
timm/ViT-B-16-SigLIP

Zero-Shot Image Classification • Updated Oct 25, 2023 • 31.2k • 29
Running on Zero

19

🌖

Slimsam

Small yet powerful mask generation application ⚡️

Transformers & MoE

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Paper • 2312.07987 • Published Dec 13, 2023 • 41
Interfacing Foundation Models' Embeddings

Paper • 2312.07532 • Published Dec 12, 2023 • 10
Point Transformer V3: Simpler, Faster, Stronger

Paper • 2312.10035 • Published Dec 15, 2023 • 17
TheBloke/quantum-v0.01-GPTQ

Text Generation • Updated Dec 18, 2023 • 18 • 2

Language Models

Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 117
stabilityai/stable-video-diffusion-img2vid-xt

Image-to-Video • Updated Jul 10, 2024 • 188k • 2.8k
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Paper • 2311.13384 • Published Nov 22, 2023 • 50
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

Paper • 2311.12454 • Published Nov 21, 2023 • 30

any size diffusion

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

Paper • 2308.16582 • Published Aug 31, 2023 • 10
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation

Paper • 2310.13119 • Published Oct 19, 2023 • 11
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

Paper • 2310.16818 • Published Oct 25, 2023 • 30
Text-to-3D with classifier score distillation

Paper • 2310.19415 • Published Oct 30, 2023 • 4

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs