-
Training Compute-Optimal Large Language Models
Paper ā¢ 2203.15556 ā¢ Published ā¢ 10 -
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Paper ā¢ 1909.08053 ā¢ Published ā¢ 2 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper ā¢ 1910.10683 ā¢ Published ā¢ 11 -
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Paper ā¢ 2304.01373 ā¢ Published ā¢ 9
Collections
Discover the best community collections!
Collections including paper arxiv:2203.15556
-
Training Compute-Optimal Large Language Models
Paper ā¢ 2203.15556 ā¢ Published ā¢ 10 -
Perspectives on the State and Future of Deep Learning -- 2023
Paper ā¢ 2312.09323 ā¢ Published ā¢ 8 -
MobileSAMv2: Faster Segment Anything to Everything
Paper ā¢ 2312.09579 ā¢ Published ā¢ 24 -
Point Transformer V3: Simpler, Faster, Stronger
Paper ā¢ 2312.10035 ā¢ Published ā¢ 20
-
Lost in the Middle: How Language Models Use Long Contexts
Paper ā¢ 2307.03172 ā¢ Published ā¢ 40 -
Efficient Estimation of Word Representations in Vector Space
Paper ā¢ 1301.3781 ā¢ Published ā¢ 6 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ā¢ 1810.04805 ā¢ Published ā¢ 17 -
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 55
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 55 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ā¢ 1810.04805 ā¢ Published ā¢ 17 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper ā¢ 1907.11692 ā¢ Published ā¢ 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper ā¢ 1910.01108 ā¢ Published ā¢ 14
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 55 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ā¢ 1810.04805 ā¢ Published ā¢ 17 -
Universal Language Model Fine-tuning for Text Classification
Paper ā¢ 1801.06146 ā¢ Published ā¢ 6 -
Language Models are Few-Shot Learners
Paper ā¢ 2005.14165 ā¢ Published ā¢ 13
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 55 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ā¢ 2307.08691 ā¢ Published ā¢ 8 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 158 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 46
-
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Paper ā¢ 2309.03883 ā¢ Published ā¢ 35 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper ā¢ 2106.09685 ā¢ Published ā¢ 35 -
Agents: An Open-source Framework for Autonomous Language Agents
Paper ā¢ 2309.07870 ā¢ Published ā¢ 42 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper ā¢ 2309.00267 ā¢ Published ā¢ 48