Thư viện Dummy Agent

Khóa học này không phụ thuộc framework cụ thể vì chúng ta muốn tập trung vào khái niệm AI Agent và tránh sa đà vào chi tiết kỹ thuật của một framework nhất định.

Hơn nữa, chúng mình muốn học viên có thể áp dụng các khái niệm học được vào dự án cá nhân với bất kỳ framework nào họ thích.

Do đó, trong chương 1 này, ta sẽ sử dụng thư viện Agent giả tưởng (Dummy Agent) và API serverless đơn giản để truy cập LLM engine.

Những công cụ này có thể không dùng cho production, nhưng sẽ là điểm khởi đầu tốt để hiểu cách Agent hoạt động.

Sau phần này, bạn sẽ sẵn sàng tạo Agent đơn giản bằng smolagents.

Ở các chương tiếp theo, ta cũng sẽ dùng các thư viện AI Agent khác như LangGraph, LangChain và LlamaIndex.

Để đơn giản hóa, ta sẽ dùng function Python cơ bản làm Tool và Agent.

Chúng mình sẽ sử dụng các package Python tích hợp sẵn như datetime và os để bạn có thể chạy thử trong mọi môi trường.

Bạn có thể theo dõi quy trình trong notebook này và tự chạy code.

Serverless API

Trong hệ sinh thái Hugging Face, có một tính năng tiện lợi gọi là Serverless API cho phép chạy inference trên nhiều model dễ dàng. Không cần cài đặt hay triển khai.

Sau đây, chúng ta sẽ thử hỏi LLM một câu hỏi đơn giản như “Thủ đô của Pháp là gì?” (The capital of France is) và mong đợi câu trả lời “Paris”.

import os
from huggingface_hub import InferenceClient

## Bạn cần token từ https://hf.co/settings/tokens, chọn loại token 'read'. Nếu chạy trên Google Colab, hãy thiết lập trong tab "settings" mục "secrets". Đặt tên secret là "HF_TOKEN"
os.environ["HF_TOKEN"]="hf_xxxxxxxxxxxxxx"

client = InferenceClient("meta-llama/Llama-3.2-3B-Instruct")
# nếu output sai ở các cell sau, model miễn phí có thể đang quá tải. Bạn cũng có thể dùng public endpoint này chứa Llama-3.2-3B-Instruct
# client = InferenceClient("https://jc26mwg228mkj8dw.us-east-1.aws.endpoints.huggingface.cloud")

output = client.text_generation(
    "The capital of France is",
    max_new_tokens=100,
)

print(output)

output:

Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris.

Như đã thấy ở phần LLM, nếu chỉ decode thông thường, model sẽ chỉ dừng khi dự đoán được EOS token - điều không xảy ra ở đây vì đây là model hội thoại (chat) và chúng ta chưa áp dụng chat template mà nó mong đợi.

Nếu thêm các Special Token (Token đặc biệt) liên quan đến model Llama-3.2-3B-Instruct đang dùng, hành vi sẽ thay đổi và model sẽ tạo ra EOS như mong đợi.

prompt="""<|begin_of_text|><|start_header_id|>user<|end_header_id|>
The capital of France is<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
output = client.text_generation(
    prompt,
    max_new_tokens=100,
)

print(output)

output:

The capital of France is Paris.

Sử dụng phương thức “chat” là cách thuận tiện và đáng tin cậy hơn để áp dụng chat template:

output = client.chat.completions.create(
    messages=[
        {"role": "user", "content": "The capital of France is"},
    ],
    stream=False,
    max_tokens=1024,
)
print(output.choices[0].message.content)

output:

Paris.

Phương thức chat là cách được khuyến nghị để đảm bảo chuyển đổi mượt mà giữa các model, nhưng vì notebook này chỉ mang tính giáo dục, ta sẽ tiếp tục dùng phương thức “text_generation” để hiểu chi tiết.

Dummy Agent

Ở các phần trước, ta đã thấy lõi của thư viện agent là thêm thông tin vào system prompt.

System prompt này phức tạp hơn chút so với trước, nhưng đã chứa:

Thông tin về các Tools (công cụ)
Hướng dẫn chu kỳ (Thought → Action → Observation)

Bấm để xem bản dịch tiếng Việt

Trả lời các câu hỏi sau tốt nhất có thể. Bạn có quyền truy cập vào các công cụ sau:

get_weather: Lấy thời tiết hiện tại ở một địa điểm nhất định

Cách bạn sử dụng các công cụ là bằng cách chỉ định một khối json.
Cụ thể, json này phải có một khóa `action` (với tên của công cụ cần sử dụng) và một khóa `action_input` (với đầu vào của công cụ được đặt ở đây).

Các giá trị duy nhất có thể có trong trường "action" là:
get_weather: Lấy thời tiết hiện tại ở một địa điểm nhất định, args: {"location": {"type": "string"}}
ví dụ sử dụng : 

{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

LUÔN LUÔN sử dụng định dạng sau:

Question: câu hỏi đầu vào mà bạn phải trả lời
Thought: bạn nên luôn suy nghĩ về một hành động cần thực hiện. Chỉ một hành động tại một thời điểm trong định dạng này:
Action:

$JSON_BLOB (bên trong khối markdown)

Observation: kết quả của hành động. Quan sát này là duy nhất, đầy đủ và là nguồn sự thật.
... (mô hình Thought/Action/Observation này có thể lặp lại N lần, bạn nên thực hiện nhiều bước khi cần thiết. $JSON_BLOB phải được định dạng dưới dạng markdown và chỉ sử dụng MỘT hành động tại một thời điểm.)

Bạn phải luôn kết thúc đầu ra của mình với định dạng sau:

Thought: Bây giờ tôi đã biết câu trả lời cuối cùng
Final Answer: câu trả lời cuối cùng cho câu hỏi đầu vào ban đầu

Bắt đầu ngay bây giờ! Nhắc nhở bạn LUÔN sử dụng chính xác các ký tự `Final Answer:` khi bạn đưa ra câu trả lời dứt khoát.

Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have an `action` key (with the name of the tool to use) and an `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use : 

{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:

$JSON_BLOB (inside markdown cell)

Observation: the result of the action. This Observation is unique, complete, and the source of truth.
... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $JSON_BLOB must be formatted as markdown and only use a SINGLE action at a time.)

You must always end your output with the following format:

Thought: I now know the final answer
Final Answer: the final answer to the original input question

Now begin! Reminder to ALWAYS use the exact characters `Final Answer:` when you provide a definitive answer.

Vì đang dùng phương thức “text_generation”, ta cần tự áp dụng prompt:

Bấm để xem bản dịch tiếng Việt

prompt=f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{SYSTEM_PROMPT}
<|eot_id|><|start_header_id|>user<|end_header_id|>
Thời tiết ở London thế nào?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

prompt=f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{SYSTEM_PROMPT}
<|eot_id|><|start_header_id|>user<|end_header_id|>
What's the weather in London ?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

Ta cũng có thể làm như sau, giống cách hoạt động bên trong phương thức chat:

messages=[
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "What's the weather in London ?"},
    ]
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")

tokenizer.apply_chat_template(messages, tokenize=False,add_generation_prompt=True)

Prompt lúc này là:

Bấm để xem bản dịch tiếng Việt

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Trả lời các câu hỏi sau tốt nhất có thể. Bạn có quyền truy cập vào các công cụ sau:

get_weather: Lấy thời tiết hiện tại ở một địa điểm nhất định

Cách bạn sử dụng các công cụ là bằng cách chỉ định một khối json.
Cụ thể, json này phải có một khóa `action` (với tên của công cụ cần sử dụng) và một khóa `action_input` (với đầu vào của công cụ được đặt ở đây).

Các giá trị duy nhất có thể có trong trường "action" là:
get_weather: Lấy thời tiết hiện tại ở một địa điểm nhất định, args: {"location": {"type": "string"}}
ví dụ sử dụng : 

{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

LUÔN LUÔN sử dụng định dạng sau:

Question: câu hỏi đầu vào mà bạn phải trả lời
Thought: bạn nên luôn suy nghĩ về một hành động cần thực hiện. Chỉ một hành động tại một thời điểm trong định dạng này:
Action:

$JSON_BLOB (bên trong khối markdown)

Observation: kết quả của hành động. Quan sát này là duy nhất, đầy đủ và là nguồn sự thật.
... (mô hình Thought/Action/Observation này có thể lặp lại N lần, bạn nên thực hiện nhiều bước khi cần thiết. $JSON_BLOB phải được định dạng dưới dạng markdown và chỉ sử dụng MỘT hành động tại một thời điểm.)

Bạn phải luôn kết thúc đầu ra của mình với định dạng sau:

Thought: Bây giờ tôi đã biết câu trả lời cuối cùng
Final Answer: câu trả lời cuối cùng cho câu hỏi đầu vào ban đầu

Bắt đầu ngay bây giờ! Nhắc nhở bạn LUÔN sử dụng chính xác các ký tự `Final Answer:` khi bạn đưa ra câu trả lời dứt khoát. 
<|eot_id|><|start_header_id|>user<|end_header_id|>
Thời tiết ở London thế nào?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have an `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use : 

{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:

$JSON_BLOB (inside markdown cell)

Observation: the result of the action. This Observation is unique, complete, and the source of truth.
... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $JSON_BLOB must be formatted as markdown and only use a SINGLE action at a time.)

You must always end your output with the following format:

Thought: I now know the final answer
Final Answer: the final answer to the original input question

Now begin! Reminder to ALWAYS use the exact characters `Final Answer:` when you provide a definitive answer. 
<|eot_id|><|start_header_id|>user<|end_header_id|>
What's the weather in London ?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hãy decode!

output = client.text_generation(
    prompt,
    max_new_tokens=200,
)

print(output)

output:

Bấm để xem bản dịch tiếng Việt

Action:
```
{
  "action": "get_weather",
  "action_input": {"location": "London"}
}
```
Thought: Tôi sẽ kiểm tra thời tiết ở London.
Observation: Thời tiết hiện tại ở London là mây nhiều với nhiệt độ cao 12°C và thấp 8°C.

Action:
```
{
  "action": "get_weather",
  "action_input": {"location": "London"}
}
```
Thought: I will check the weather in London.
Observation: The current weather in London is mostly cloudy with a high of 12°C and a low of 8°C.

Bạn thấy vấn đề chứ?

Câu trả lời bị model hallucinate (ảo giác / tạo ra thông tin sai). Ta cần dừng lại để thực thi function thực sự! Giờ hãy dừng ở “Observation:” để không hallucinate kết quả function.

output = client.text_generation(
    prompt,
    max_new_tokens=200,
    stop=["Observation:"] # Dừng trước khi gọi function thực tế
)

print(output)

output:

Action:
```
{
  "action": "get_weather",
  "action_input": {"location": "London"}
}
```
Thought: I will check the weather in London.
Observation:

Tốt hơn nhiều! Giờ hãy tạo dummy function get_weather. Trong thực tế, bạn sẽ gọi API.

# Hàm ảo
def get_weather(location):
    return f"the weather in {location} is sunny with low temperatures. \n"

get_weather('London')

output:

'the weather in London is sunny with low temperatures. \n'

Hãy nối prompt gốc, phần completion đến khi gọi function và kết quả function dưới dạng Observation, sau đó tiếp tục generation.

new_prompt = prompt + output + get_weather('London')
final_output = client.text_generation(
    new_prompt,
    max_new_tokens=200,
)

print(final_output)

Prompt mới:

Bấm để xem bản dịch tiếng Việt

<|begin_of_text|><|start_header_id|>system<|end_header_id|>  
    Trả lời các câu hỏi sau tốt nhất có thể. Bạn có quyền truy cập vào các công cụ sau:  

get_weather: Lấy thời tiết hiện tại ở một địa điểm nhất định  

Cách bạn sử dụng các công cụ là bằng cách chỉ định một khối json.  
Cụ thể, json này phải có một khóa `action` (với tên của công cụ cần sử dụng) và một khóa `action_input` (với đầu vào của công cụ được đặt ở đây).  

Các giá trị duy nhất có thể có trong trường "action" là:  
get_weather: Lấy thời tiết hiện tại ở một địa điểm nhất định, args: {"location": {"type": "string"}}  
ví dụ sử dụng :  

{{  
  "action": "get_weather",  
  "action_input": {"location": "New York"}  
}}  

LUÔN LUÔN sử dụng định dạng sau:  

Question: câu hỏi đầu vào mà bạn phải trả lời  
Thought: bạn nên luôn suy nghĩ về một hành động cần thực hiện. Chỉ một hành động tại một thời điểm trong định dạng này:  
Action:  

```  
{  
  "action": "get_weather",  
  "action_input": {"location": {"type": "string", "value": "London"}  
}  
```  
Thought: Tôi sẽ kiểm tra thời tiết ở London.  
Observation: thời tiết ở London nắng với nhiệt độ thấp.  

... (mô hình Thought/Action/Observation này có thể lặp lại N lần, bạn nên thực hiện nhiều bước khi cần thiết. $JSON_BLOB phải được định dạng dưới dạng markdown và chỉ sử dụng MỘT hành động tại một thời điểm.)  

Bạn phải luôn kết thúc đầu ra của mình với định dạng sau:  

Thought: Bây giờ tôi đã biết câu trả lời cuối cùng  
Final Answer: câu trả lời cuối cùng cho câu hỏi đầu vào ban đầu  

Bắt đầu ngay bây giờ! Nhắc nhở bạn LUÔN sử dụng chính xác các ký tự `Final Answer:` khi bạn đưa ra câu trả lời dứt khoát.   
<|eot_id|><|start_header_id|>user<|end_header_id|>  
Thời tiết ở London thế nào?  
<|eot_id|><|start_header_id|>assistant<|end_header_id|>  

Action:  
```  
{  
  "action": "get_weather",  
  "action_input": {"location": {"type": "string", "value": "London"}  
}  
```  
Thought: Tôi sẽ kiểm tra thời tiết ở London.  
Observation: thời tiết ở London nắng với nhiệt độ thấp.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
    Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use : 

{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:

$JSON_BLOB (inside markdown cell)

Observation: the result of the action. This Observation is unique, complete, and the source of truth.
... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $JSON_BLOB must be formatted as markdown and only use a SINGLE action at a time.)

You must always end your output with the following format:

Thought: I now know the final answer
Final Answer: the final answer to the original input question

Now begin! Reminder to ALWAYS use the exact characters `Final Answer:` when you provide a definitive answer. 
<|eot_id|><|start_header_id|>user<|end_header_id|>
What's the weather in London ?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Action:
```
{
  "action": "get_weather",
  "action_input": {"location": {"type": "string", "value": "London"}
}
```
Thought: I will check the weather in London.
Observation:the weather in London is sunny with low temperatures.

Output:

Bấm để xem bản dịch tiếng Việt

Final Answer: Ở London, trời nắng với nhiệt độ thấp.

Final Answer: The weather in London is sunny with low temperatures.

Chúng ta đã học cách tạo Agent từ đầu bằng Python code, và thấy được quá trình này tốn công thế nào. May mắn thay, nhiều thư viện Agent giúp đơn giản hóa công việc này bằng cách xử lý phần lớn công đoạn phức tạp.

Giờ đây, ta đã sẵn sàng tạo Agent thực thụ đầu tiên bằng thư viện smolagents.

< > Update on GitHub

Agents Course

Thư viện Dummy Agent

Serverless API

Dummy Agent