netease-youdao
/

Confucius-o1-14B

@@ -11,7 +11,7 @@ library_name: transformers
 # Confucius-o1-14B
 ## Introduction
-**Confucius-o1-14B** is a o1-like reasoning model developed by the NETEASE Youdao Team, it can be easily deployed on a single GPU without quantization. This model is based on the Qwen2.5-14B-Instruct model and adopts a two-stage learning strategy, enabling the lightweight 14B model to possess thinking abilities similar to those of o1. What sets it apart is that after generating the chain of thought, it can summarize a step-by-step problem-solving process from the chain of thought on its own. This can prevent users from getting bogged down in the complex chain of thought and allows them to easily obtain the correct problem-solving ideas and answers.
 However, there are some limitations that must be stated in advance:
 1. **Scenario Limitations**: Our optimization is only carried out on data from the K12 mathematics scenario, and the effectiveness has only been verified in math-related benchmark tests. The performance of the model in non-mathematical scenarios has not been tested, so we cannot guarantee its quality and effectiveness in other fields.
@@ -64,10 +64,38 @@ USER_PROMPT_TEMPLATE = """现在，让我们开始吧！
 Then you can create your `messages` as follows and use them to request model results. You just need to fill in your instructions in the "question" field.
 ```python
 messages = [
     {'role': 'system', 'content': SYSTEM_PROMPT_TEMPLATE},
     {'role': 'user', 'content': USER_PROMPT_TEMPLATE.format(question=question)},
 ]
 ```
 After obtaining the model results, you can parse out the "thinking" and "summary" parts as follows.
@@ -86,7 +114,7 @@ def parse_result_nostep(result):
     summary = summary_list[0].strip()
     return thinking, summary
-thinking, summary = parse_result_nostep(result)
 ```
 ## Citation
@@ -94,7 +122,7 @@ thinking, summary = parse_result_nostep(result)
 If you find our work helpful, feel free to give us a cite.
 ```
 @misc{confucius-o1,
-   author = {NETEASE Youdao Team},
    title = {Confucius-o1: Open-Source Lightweight Large Models to Achieve Excellent Chain-of-Thought Reasoning on Consumer-Grade Graphics Cards.},
    url = {},
    month = {January},

 # Confucius-o1-14B
 ## Introduction
+**Confucius-o1-14B** is a o1-like reasoning model developed by the NetEase Youdao Team, it can be easily deployed on a single GPU without quantization. This model is based on the Qwen2.5-14B-Instruct model and adopts a two-stage learning strategy, enabling the lightweight 14B model to possess thinking abilities similar to those of o1. What sets it apart is that after generating the chain of thought, it can summarize a step-by-step problem-solving process from the chain of thought on its own. This can prevent users from getting bogged down in the complex chain of thought and allows them to easily obtain the correct problem-solving ideas and answers.
 However, there are some limitations that must be stated in advance:
 1. **Scenario Limitations**: Our optimization is only carried out on data from the K12 mathematics scenario, and the effectiveness has only been verified in math-related benchmark tests. The performance of the model in non-mathematical scenarios has not been tested, so we cannot guarantee its quality and effectiveness in other fields.
 Then you can create your `messages` as follows and use them to request model results. You just need to fill in your instructions in the "question" field.
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "netease-youdao/Confucius-o1-14B"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
 messages = [
     {'role': 'system', 'content': SYSTEM_PROMPT_TEMPLATE},
     {'role': 'user', 'content': USER_PROMPT_TEMPLATE.format(question=question)},
 ]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=16384
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
 After obtaining the model results, you can parse out the "thinking" and "summary" parts as follows.
     summary = summary_list[0].strip()
     return thinking, summary
+thinking, summary = parse_result_nostep(response)
 ```
 ## Citation
 If you find our work helpful, feel free to give us a cite.
 ```
 @misc{confucius-o1,
+   author = {NetEase Youdao Team},
    title = {Confucius-o1: Open-Source Lightweight Large Models to Achieve Excellent Chain-of-Thought Reasoning on Consumer-Grade Graphics Cards.},
    url = {},
    month = {January},