-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper β’ 2402.17764 β’ Published β’ 608 -
Qwen2.5 Technical Report
Paper β’ 2412.15115 β’ Published β’ 343 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper β’ 2404.14219 β’ Published β’ 255 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper β’ 2312.11514 β’ Published β’ 258
Collections
Discover the best community collections!
Collections including paper arxiv:2307.09288
-
black-forest-labs/FLUX.1-dev
Text-to-Image β’ Updated β’ 1.51M β’ β’ 8.26k -
openai/whisper-large-v3-turbo
Automatic Speech Recognition β’ Updated β’ 2.72M β’ β’ 1.83k -
meta-llama/Llama-3.2-11B-Vision-Instruct
Image-Text-to-Text β’ Updated β’ 2.59M β’ β’ 1.26k -
deepseek-ai/DeepSeek-V2.5
Text Generation β’ Updated β’ 4.78k β’ 685
-
Self-Play Preference Optimization for Language Model Alignment
Paper β’ 2405.00675 β’ Published β’ 27 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper β’ 2205.14135 β’ Published β’ 12 -
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 50 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper β’ 2307.08691 β’ Published β’ 8
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper β’ 2402.17764 β’ Published β’ 608 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper β’ 2404.14219 β’ Published β’ 255 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper β’ 2307.09288 β’ Published β’ 244 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper β’ 2312.11514 β’ Published β’ 258