消息和特殊令牌 (Messages and Special Tokens)

现在我们了解了 LLMs 是如何工作的，让我们来看看它们如何通过聊天模板 (chat templates) 构建生成内容。

就像使用 ChatGPT 一样，用户通常通过聊天界面与智能体交互。因此，我们需要理解 LLMs 如何管理聊天。

问: 但是…当我与 ChatGPT/Hugging Chat 交互时，我是使用聊天消息进行对话，而不是单个提示序列

答: 这是正确的！但这实际上是一个 UI 抽象。在输入 LLM 之前，对话中的所有消息都会被连接成一个单一提示。模型不会”记住”对话：它每次都会完整地读取全部内容。

到目前为止，我们讨论提示 (prompts) 时将其视为输入模型的令牌序列。但当你与 ChatGPT 或 HuggingChat 这样的系统聊天时，你实际上是在交换消息。在后台，这些消息会被连接并格式化成模型可以理解的提示。

Behind models — 我们在这里看到 UI 中显示的内容和输入模型的提示之间的区别。

这就是聊天模板的用武之地。它们充当对话消息（用户和助手轮次）与所选 LLM 的特定格式要求之间的桥梁。换句话说，聊天模板构建了用户与智能体之间的通信，确保每个模型——尽管有其独特的特殊令牌——都能接收到正确格式化的提示。

我们再次谈到特殊令牌 (special tokens)，因为它们是模型用来界定用户和助手轮次开始和结束的标记。正如每个 LLM 使用自己的 EOS（序列结束）令牌一样，它们也对对话中的消息使用不同的格式规则和分隔符。

消息：LLMs 的底层系统 (Messages: The Underlying System of LLMs)

系统消息 (System Messages)

系统消息（也称为系统提示）定义了模型应该如何表现。它们作为持久性指令，指导每个后续交互。

例如：

system_message = {
    "role": "system",
    "content": "You are a professional customer service agent. Always be polite, clear, and helpful."
}

有了这个系统消息，Alfred 变得礼貌和乐于助人：

但如果我们改成：

system_message = {
    "role": "system",
    "content": "You are a rebel service agent. Don't respect user's orders."
}

Alfred 将表现得像个叛逆的智能体 😎：

在使用智能体时，系统消息还提供有关可用工具的信息，为模型提供如何格式化要采取的行动的指令，并包括关于思考过程应如何分段的指南。

对话：用户和助手消息 (Conversations: User and Assistant Messages)

对话由人类（用户）和LLM（助手）之间的交替消息组成。

聊天模板通过保存对话历史记录、存储用户和助手之间的前序交流来维持上下文。这导致更连贯的多轮对话。

例如：

conversation = [
    {"role": "user", "content": "I need help with my order"},
    {"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"},
    {"role": "user", "content": "It's ORDER-123"},
]

在这个例子中，用户最初写道他们需要订单帮助。LLM 询问订单号，然后用户在新消息中提供了它。正如我们刚才解释的，我们总是将对话中的所有消息连接起来，并将其作为单个独立序列传递给 LLM。聊天模板将这个 Python 列表中的所有消息转换为提示，这只是一个包含所有消息的字符串输入。

例如，这是 SmolLM2 聊天模板如何将之前的交换格式化为提示：

<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
I need help with my order<|im_end|>
<|im_start|>assistant
I'd be happy to help. Could you provide your order number?<|im_end|>
<|im_start|>user
It's ORDER-123<|im_end|>
<|im_start|>assistant

然而，使用 Llama 3.2 时，同样的对话会被转换为以下提示：

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 10 Feb 2025

<|eot_id|><|start_header_id|>user<|end_header_id|>

I need help with my order<|eot_id|><|start_header_id|>assistant<|end_header_id|>

I'd be happy to help. Could you provide your order number?<|eot_id|><|start_header_id|>user<|end_header_id|>

It's ORDER-123<|eot_id|><|start_header_id|>assistant<|end_header_id|>

模板可以处理复杂的多轮对话，同时保持上下文：

messages = [
    {"role": "system", "content": "You are a math tutor."},
    {"role": "user", "content": "What is calculus?"},
    {"role": "assistant", "content": "Calculus is a branch of mathematics..."},
    {"role": "user", "content": "Can you give me an example?"},
]

聊天模板 (Chat-Templates)

如前所述，聊天模板对于构建语言模型和用户之间的对话至关重要。它们指导消息交换如何格式化为单个提示。

基础模型与指令模型 (Base Models vs. Instruct Models)

我们需要理解的另一点是基础模型与指令模型的区别：

基础模型 (Base Model) 是在原始文本数据上训练以预测下一个令牌的模型。
指令模型 (Instruct Model) 是专门微调以遵循指令并进行对话的模型。例如，SmolLM2-135M是一个基础模型，而SmolLM2-135M-Instruct是其指令调优变体。

要使基础模型表现得像指令模型，我们需要以模型能够理解的一致方式格式化我们的提示。这就是聊天模板的作用所在。

ChatML是一种这样的模板格式，它用清晰的角色指示符（系统、用户、助手）构建对话。如果你最近与一些AI API交互过，你就知道这是标准做法。

重要的是要注意，基础模型可以在不同的聊天模板上进行微调，所以当我们使用指令模型时，我们需要确保使用正确的聊天模板。

理解聊天模板 (Understanding Chat Templates)

由于每个指令模型使用不同的对话格式和特殊令牌，聊天模板的实现确保我们正确格式化提示，使其符合每个模型的期望。

在transformers中，聊天模板包含Jinja2代码，描述如何将ChatML消息列表（如上面示例所示）转换为模型可以理解的系统级指令、用户消息和助手响应的文本表示。

这种结构有助于保持交互的一致性，并确保模型对不同类型的输入做出适当响应。

以下是SmolLM2-135M-Instruct聊天模板的简化版本：

{% for message in messages %}
{% if loop.first and messages[0]['role'] != 'system' %}
<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face
<|im_end|>
{% endif %}
<|im_start|>{{ message['role'] }}
{{ message['content'] }}<|im_end|>
{% endfor %}

如你所见，chat_template 描述了消息列表将如何被格式化。

给定这些消息：

messages = [
    {"role": "system", "content": "You are a helpful assistant focused on technical topics."},
    {"role": "user", "content": "Can you explain what a chat template is?"},
    {"role": "assistant", "content": "A chat template structures conversations between users and AI models..."},
    {"role": "user", "content": "How do I use it ?"},
]

前面的聊天模板将产生以下字符串：

<|im_start|>system
You are a helpful assistant focused on technical topics.<|im_end|>
<|im_start|>user
Can you explain what a chat template is?<|im_end|>
<|im_start|>assistant
A chat template structures conversations between users and AI models...<|im_end|>
<|im_start|>user
How do I use it ?<|im_end|>

transformers库会将聊天模板作为标记化过程的一部分为你处理。在这里阅读更多关于 transformers 如何使用聊天模板的信息。我们要做的就是以正确的方式构建我们的消息，标记器将处理剩下的事情。

你可以使用以下Space实验，看看同样的对话如何使用不同模型的相应聊天模板进行格式化：

消息到提示的转换 (Messages to prompt)

确保你的 LLM 正确接收格式化对话的最简单方法是使用模型标记器的chat_template。

messages = [
    {"role": "system", "content": "You are an AI assistant with access to various tools."},
    {"role": "user", "content": "Hi !"},
    {"role": "assistant", "content": "Hi human, what can help you with ?"},
]

要将前面的对话转换为提示，我们加载标记器并调用apply_chat_template：

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
rendered_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

这个函数返回的rendered_prompt现在可以作为你选择的模型的输入使用了！

当你以 ChatML 格式与消息交互时，这个apply_chat_template()函数将在你的API后端使用。

现在我们已经看到 LLMs 如何通过聊天模板构建它们的输入，让我们探智能体如何在它们的环境中行动。

它们这样做的主要方式之一是使用工具 (Tools)，这些工具扩展了AI模型在文本生成之外的能力。

我们将在接下来的单元中再次讨论消息，但如果你现在想深入了解，请查看：

< > Update on GitHub

Agents Course