AaryanK commited on
Commit
6119f69
·
verified ·
1 Parent(s): 3636e66

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -10
README.md CHANGED
@@ -8,24 +8,29 @@ tags:
8
  - DeepSeek
9
  - CoT
10
  - finetune
11
- - LoRA
12
  ---
13
 
14
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e1a459ff3fd4fd8eedb456/b-ptT_Fg55OLI1W4iu3mO.png)
15
 
16
- A finetuned variant of **Qwen 2.5 3B Instruct** designed specifically for improved reasoning and instruction-following capabilities. This model has been built by engineers at [xioserv.com](https://xioserv.com) and incorporates specialized modifications to enhance performance for structured reasoning tasks.
17
 
18
  ---
19
 
20
  ## Overview
21
 
22
- The **AaryanK/Qwen_2.5_3B_Instruct_GRPO_Reasoning_XIOSERV** model is a refined version of the Qwen 2.5 3B Instruct model. It is optimized to provide responses in a structured format, making it particularly useful for tasks requiring clear separation between reasoning steps and final answers.
 
 
 
 
 
 
23
 
24
  ---
25
 
26
  ## System Prompt
27
 
28
- To ensure consistent and structured outputs, please use the following system prompt when interacting with the model:
29
 
30
  ```
31
  Respond in the following format:
@@ -37,7 +42,13 @@ Respond in the following format:
37
  </answer>
38
  ```
39
 
40
- This prompt guides the model to first articulate its reasoning process and then provide the final answer, which is especially useful in contexts where transparency in decision-making is valued.
 
 
 
 
 
 
41
 
42
  ---
43
 
@@ -47,18 +58,18 @@ We have provided **GGUF files** that can be run with **llama.cpp** for efficient
47
 
48
  To run the model with **llama.cpp**, follow the instructions in the [llama.cpp repository](https://github.com/ggerganov/llama.cpp).
49
 
50
-
51
- Ensure that you include the system prompt in your input or as part of your prompt engineering strategy for optimal performance.
52
 
53
  ---
54
 
55
  ## Acknowledgements
56
 
57
- - **xioserv.com** – For the engineering efforts in finetuning this model.
58
  - **Hugging Face** – For providing an accessible platform to share and deploy models.
59
 
60
  For any questions or contributions, please open an issue or submit a pull request on our [GitHub repository](https://github.com/AaryanK/Qwen_2.5_3B_Instruct_GRPO_Reasoning_XIOSERV).
61
 
62
  ---
63
 
64
- Happy coding!
 
 
8
  - DeepSeek
9
  - CoT
10
  - finetune
 
11
  ---
12
 
13
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e1a459ff3fd4fd8eedb456/9-Siwp9GLyx5F9SNzThAi.png)
14
 
15
+ A fine-tuned variant of **Qwen 2.5 3B Instruct** designed specifically for improved **toggleable reasoning** and **instruction-following capabilities**. This model has been built by engineers at [xioserv.com](https://xioserv.com) and incorporates specialized modifications to enhance performance for structured reasoning tasks.
16
 
17
  ---
18
 
19
  ## Overview
20
 
21
+ The **AaryanK/Qwen_2.5_3B_Instruct_GRPO_Reasoning_XIOSERV** model is a refined version of the Qwen 2.5 3B Instruct model. It is optimized to provide responses in a structured format, making it particularly useful for tasks requiring clear separation between reasoning steps and final answers.
22
+
23
+ ### **Toggleable Reasoning Mode**
24
+ - If you include the **system prompt**, the model will **explicitly separate reasoning and the final answer**.
25
+ - If you **omit the system prompt**, the model will **respond naturally** without structured reasoning.
26
+
27
+ This makes the model highly **versatile**, allowing users to choose between structured reasoning and direct responses based on their specific use case.
28
 
29
  ---
30
 
31
  ## System Prompt
32
 
33
+ To enable structured reasoning, use the following system prompt:
34
 
35
  ```
36
  Respond in the following format:
 
42
  </answer>
43
  ```
44
 
45
+ If you do not include this prompt, the model will respond in a **standard, conversational** manner without explicitly separating reasoning from the final answer.
46
+
47
+ ---
48
+
49
+ ## Methodology
50
+
51
+ To replicate the 'aha moment,' we employed **Group Relative Policy Optimization (GRPO)**, a variant of **Proximal Policy Optimization (PPO)**, which enhances reasoning capabilities while optimizing memory usage. This approach aligns with the techniques outlined in the **DeepSeekMath** paper, where GRPO was instrumental in advancing reasoning in language models. By integrating GRPO with reinforcement learning, our model autonomously refines its problem-solving strategies, mirroring the **self-reflective behavior** observed in **DeepSeek's R1**.
52
 
53
  ---
54
 
 
58
 
59
  To run the model with **llama.cpp**, follow the instructions in the [llama.cpp repository](https://github.com/ggerganov/llama.cpp).
60
 
61
+ Ensure that you include the system prompt in your input **if you want structured reasoning output**. Otherwise, the model will function like a standard instruct model.
 
62
 
63
  ---
64
 
65
  ## Acknowledgements
66
 
67
+ - **xioserv.com** – For the engineering efforts in fine-tuning this model.
68
  - **Hugging Face** – For providing an accessible platform to share and deploy models.
69
 
70
  For any questions or contributions, please open an issue or submit a pull request on our [GitHub repository](https://github.com/AaryanK/Qwen_2.5_3B_Instruct_GRPO_Reasoning_XIOSERV).
71
 
72
  ---
73
 
74
+ Happy coding!
75
+