Collections
Discover the best community collections!
Collections including paper arxiv:2410.05258
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 26 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 41 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 135 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 14
-
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation
Paper • 2410.23090 • Published • 54 -
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Paper • 2410.13276 • Published • 27 -
Personalized Visual Instruction Tuning
Paper • 2410.07113 • Published • 70 -
Differential Transformer
Paper • 2410.05258 • Published • 171
-
Differential Transformer
Paper • 2410.05258 • Published • 171 -
Baichuan-Omni Technical Report
Paper • 2410.08565 • Published • 85 -
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper • 2410.17243 • Published • 90 -
FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors
Paper • 2410.16271 • Published • 81
-
Differential Transformer
Paper • 2410.05258 • Published • 171 -
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper • 2410.20672 • Published • 6 -
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Paper • 2410.23168 • Published • 24
-
Differential Transformer
Paper • 2410.05258 • Published • 171 -
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper • 2412.03555 • Published • 129 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 107 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 44