-
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights
Paper • 2410.09008 • Published • 17 -
answerdotai/ModernBERT-base
Fill-Mask • Updated • 5.42M • 784 -
answerdotai/ModernBERT-large
Fill-Mask • Updated • 269k • 360 -
microsoft/phi-4
Text Generation • Updated • 535k • • 1.88k
Collections
Discover the best community collections!
Collections including paper arxiv:2309.14402
-
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
Paper • 2408.15545 • Published • 35 -
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 65 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 42 -
Automated Design of Agentic Systems
Paper • 2408.08435 • Published • 39
-
Physics of Language Models: Part 1, Context-Free Grammar
Paper • 2305.13673 • Published • 7 -
Physics of Language Models: Part 3.2, Knowledge Manipulation
Paper • 2309.14402 • Published • 7 -
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Paper • 2404.05405 • Published • 10 -
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Paper • 2309.14316 • Published • 8
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 39 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 105 -
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 52 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper • 2402.12875 • Published • 13
-
Attention Is All You Need
Paper • 1706.03762 • Published • 55 -
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper • 2005.11401 • Published • 10 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 35 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper • 2205.14135 • Published • 13