-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 25 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 41 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 121 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2405.21048
-
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 66 -
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling
Paper • 2405.21048 • Published • 14 -
Scalable Autoregressive Image Generation with Mamba
Paper • 2408.12245 • Published • 26 -
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
Paper • 2410.08159 • Published • 25
-
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling
Paper • 2405.21048 • Published • 14 -
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper • 2406.02657 • Published • 37 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 66
-
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 108 -
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Paper • 2404.18796 • Published • 68 -
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling
Paper • 2405.21048 • Published • 14
-
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 27 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 33 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 11 -
3D Congealing: 3D-Aware Image Alignment in the Wild
Paper • 2404.02125 • Published • 7
-
Instruct-Imagen: Image Generation with Multi-modal Instruction
Paper • 2401.01952 • Published • 31 -
ODIN: A Single Model for 2D and 3D Perception
Paper • 2401.02416 • Published • 11 -
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 21 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 11
-
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models
Paper • 2312.09608 • Published • 13 -
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper • 2310.17680 • Published • 70 -
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image
Paper • 2310.17994 • Published • 8 -
Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss
Paper • 2401.02677 • Published • 22