-
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
Paper • 2210.14986 • Published • 5 -
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Paper • 2311.10702 • Published • 19 -
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 75 -
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Paper • 2309.04269 • Published • 32
Collections
Discover the best community collections!
Collections including paper arxiv:2403.07816
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 40 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 27 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 50 -
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Paper • 2403.03432 • Published • 1
-
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 48 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 64 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 40 -
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Paper • 2311.07463 • Published • 14
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 40 -
microsoft/phi-1_5
Text Generation • Updated • 110k • 1.32k -
Language models scale reliably with over-training and on downstream tasks
Paper • 2403.08540 • Published • 15 -
Akashpb13/Swahili_xlsr
Automatic Speech Recognition • Updated • 97 • 8
-
Scaling Instruction-Finetuned Language Models
Paper • 2210.11416 • Published • 7 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 140 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 62 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62
-
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts
Paper • 2306.04845 • Published • 4 -
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks
Paper • 2306.04073 • Published • 2 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 40 -
Unified Scaling Laws for Routed Language Models
Paper • 2202.01169 • Published • 2