Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2410.05258

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171

Transformers_PMK

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171
Large Language Diffusion Models

Paper • 2502.09992 • Published 27 days ago • 103

Paper - Multimodal

Paper related to Multimodal Model - Research for a : Modular, Multimodal, Multi-Stream, Mixture of Expert, Universal Transformer, Matryoshka embedding

Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Paper • 2412.15213 • Published Dec 19, 2024 • 26
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published Dec 16, 2024 • 41
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 135
Autoregressive Video Generation without Vector Quantization

Paper • 2412.14169 • Published Dec 18, 2024 • 14

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 352
Pipeline Analysis for Developing Instruct LLMs in Low-Resource Languages: A Case Study on Basque

Paper • 2412.13922 • Published Dec 18, 2024

CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation

Paper • 2410.23090 • Published Oct 30, 2024 • 54
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 27
Personalized Visual Instruction Tuning

Paper • 2410.07113 • Published Oct 9, 2024 • 70
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171

📑 Trending Papers - October 🔟

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171
Baichuan-Omni Technical Report

Paper • 2410.08565 • Published Oct 11, 2024 • 85
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22, 2024 • 90
FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors

Paper • 2410.16271 • Published Oct 21, 2024 • 81

Model Architecture

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA

Paper • 2410.20672 • Published Oct 28, 2024 • 6
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Paper • 2410.23168 • Published Oct 30, 2024 • 24

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171
PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published Dec 4, 2024 • 129
VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published Dec 5, 2024 • 107
o1-Coder: an o1 Replication for Coding

Paper • 2412.00154 • Published Nov 29, 2024 • 44

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 100

Previous
1
2
3
...
6
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs