Update README.md
Browse files
README.md
CHANGED
@@ -8,24 +8,29 @@ tags:
|
|
8 |
- DeepSeek
|
9 |
- CoT
|
10 |
- finetune
|
11 |
-
- LoRA
|
12 |
---
|
13 |
|
14 |
-
.
|
49 |
|
50 |
-
|
51 |
-
Ensure that you include the system prompt in your input or as part of your prompt engineering strategy for optimal performance.
|
52 |
|
53 |
---
|
54 |
|
55 |
## Acknowledgements
|
56 |
|
57 |
-
- **xioserv.com** – For the engineering efforts in
|
58 |
- **Hugging Face** – For providing an accessible platform to share and deploy models.
|
59 |
|
60 |
For any questions or contributions, please open an issue or submit a pull request on our [GitHub repository](https://github.com/AaryanK/Qwen_2.5_3B_Instruct_GRPO_Reasoning_XIOSERV).
|
61 |
|
62 |
---
|
63 |
|
64 |
-
Happy coding!
|
|
|
|
8 |
- DeepSeek
|
9 |
- CoT
|
10 |
- finetune
|
|
|
11 |
---
|
12 |
|
13 |
+

|
14 |
|
15 |
+
A fine-tuned variant of **Qwen 2.5 3B Instruct** designed specifically for improved **toggleable reasoning** and **instruction-following capabilities**. This model has been built by engineers at [xioserv.com](https://xioserv.com) and incorporates specialized modifications to enhance performance for structured reasoning tasks.
|
16 |
|
17 |
---
|
18 |
|
19 |
## Overview
|
20 |
|
21 |
+
The **AaryanK/Qwen_2.5_3B_Instruct_GRPO_Reasoning_XIOSERV** model is a refined version of the Qwen 2.5 3B Instruct model. It is optimized to provide responses in a structured format, making it particularly useful for tasks requiring clear separation between reasoning steps and final answers.
|
22 |
+
|
23 |
+
### **Toggleable Reasoning Mode**
|
24 |
+
- If you include the **system prompt**, the model will **explicitly separate reasoning and the final answer**.
|
25 |
+
- If you **omit the system prompt**, the model will **respond naturally** without structured reasoning.
|
26 |
+
|
27 |
+
This makes the model highly **versatile**, allowing users to choose between structured reasoning and direct responses based on their specific use case.
|
28 |
|
29 |
---
|
30 |
|
31 |
## System Prompt
|
32 |
|
33 |
+
To enable structured reasoning, use the following system prompt:
|
34 |
|
35 |
```
|
36 |
Respond in the following format:
|
|
|
42 |
</answer>
|
43 |
```
|
44 |
|
45 |
+
If you do not include this prompt, the model will respond in a **standard, conversational** manner without explicitly separating reasoning from the final answer.
|
46 |
+
|
47 |
+
---
|
48 |
+
|
49 |
+
## Methodology
|
50 |
+
|
51 |
+
To replicate the 'aha moment,' we employed **Group Relative Policy Optimization (GRPO)**, a variant of **Proximal Policy Optimization (PPO)**, which enhances reasoning capabilities while optimizing memory usage. This approach aligns with the techniques outlined in the **DeepSeekMath** paper, where GRPO was instrumental in advancing reasoning in language models. By integrating GRPO with reinforcement learning, our model autonomously refines its problem-solving strategies, mirroring the **self-reflective behavior** observed in **DeepSeek's R1**.
|
52 |
|
53 |
---
|
54 |
|
|
|
58 |
|
59 |
To run the model with **llama.cpp**, follow the instructions in the [llama.cpp repository](https://github.com/ggerganov/llama.cpp).
|
60 |
|
61 |
+
Ensure that you include the system prompt in your input **if you want structured reasoning output**. Otherwise, the model will function like a standard instruct model.
|
|
|
62 |
|
63 |
---
|
64 |
|
65 |
## Acknowledgements
|
66 |
|
67 |
+
- **xioserv.com** – For the engineering efforts in fine-tuning this model.
|
68 |
- **Hugging Face** – For providing an accessible platform to share and deploy models.
|
69 |
|
70 |
For any questions or contributions, please open an issue or submit a pull request on our [GitHub repository](https://github.com/AaryanK/Qwen_2.5_3B_Instruct_GRPO_Reasoning_XIOSERV).
|
71 |
|
72 |
---
|
73 |
|
74 |
+
Happy coding!
|
75 |
+
|