Question on tokenizer chat template and readout

#4
by dereklim - opened

Here is what I get for a chat template (when running print(tokenizer.apply_chat_template([{"role": "user", "content": "what is 1+1?"}, {"role": "assistant", "content": "It is 2."}], tokenize=False), add_generation_prompt=False) )

<extra_id_0>System
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

<extra_id_1>User
what is 1+1?
<extra_id_1>Assistant
It is 2.
<extra_id_2>

From what I understand, the reward is the predicted output from the very last token. Here, the last token is just > (<extra_id_2> is not a special token in the tokenizer). I would like to double check that this is intended behavior, and if so, ask why your team chose not to use a special token as the last token that the reward is read-out from. Thanks!

NVIDIA org

Yes, this is the intended behavior. From a correctness perspective, we think this approach is fine because we do both training and evaluation on the last token, so it's not really any different from if it were a special token. The primary reason is that model training and evaluation pipeline was initially prepared for models like https://huggingface.co/nvidia/Nemotron-4-340B-Reward which do have this special token in the tokenizer. We were curious about how a Llama 3.1 model performs (after it was first released in late July 2024) and wanted to train it, without changing too many things in our pipeline. Hence, we sticked to this template.

Sign up or log in to comment