Papers
arxiv:2309.11235

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Published on Sep 20, 2023
Authors:
,
,
,
,

Abstract

Nowadays, open-source large language models like LLaMA have emerged. Recent developments have incorporated supervised fine-tuning (SFT) and reinforcement learning fine-tuning (RLFT) to align these models with human goals. However, SFT methods treat all training data with mixed quality equally, while RLFT methods require high-quality pairwise or ranking-based preference data. In this study, we present a novel framework, named OpenChat, to advance open-source language models with mixed-quality data. Specifically, we consider the general SFT training data, consisting of a small amount of expert data mixed with a large proportion of sub-optimal data, without any preference labels. We propose the C(onditioned)-RLFT, which regards different data sources as coarse-grained reward labels and learns a class-conditioned policy to leverage complementary data quality information. Interestingly, the optimal policy in C-RLFT can be easily solved through single-stage, RL-free supervised learning, which is lightweight and avoids costly human preference labeling. Through extensive experiments on three standard benchmarks, our openchat-13b fine-tuned with C-RLFT achieves the highest average performance among all 13b open-source language models. Moreover, we use AGIEval to validate the model generalization performance, in which only openchat-13b surpasses the base model. Finally, we conduct a series of analyses to shed light on the effectiveness and robustness of OpenChat. Our code, data, and models are publicly available at https://github.com/imoneoi/openchat.

Community

ffff

No description provided.

bạn thấy gì ?

Great work, guys!

A quick note, however.

While OpenChat is genuinely open source (Apache 2.0 license), LLaMA is not and should not be referred as such, because of the restrictions in the license.
It's important not to reinforce Meta's misinformation, especially in actual open source projects such as OpenChat.

Sign up or log in to comment

Models citing this paper 141

Browse 141 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2309.11235 in a dataset README.md to link it from this page.

Spaces citing this paper 258

Collections including this paper 11