This repository contains the released models for our paper Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model.
-
yyqoni/Phi-3-mini-4k-instruct-segment-rm-700k
Text Classification • Updated • 17 -
yyqoni/Phi-3-mini-4k-instruct-token-rm-700k
Text Classification • Updated • 12 -
yyqoni/Phi-3-mini-4k-instruct-bandit-rm-700k
Text Classification • Updated • 13 -
yyqoni/rlhflow-llama-3-sft-8b-v2-segment-rm-700k
Text Classification • Updated • 13