Update README.md
Browse files
README.md
CHANGED
@@ -41,7 +41,7 @@ Think about it: most AI models blindly generate reasoning even when unnecessary,
|
|
41 |
- Reasoning & Self-Reflection: The model first decides if reasoning is necessary and then either provides step-by-step logic or directly answers the question.
|
42 |
- Structured Output: Responses follow a strict format with `<think>`, `<reflection>`, and `<answer>` sections, ensuring clarity and interpretability.
|
43 |
- Optimized Training: Trained using GRPO (Guided Reward Policy Optimization) to enforce structured responses and improve decision-making.
|
44 |
-
- Efficient Inference: Fine-tuned with Unsloth & Hugging Face
|
45 |
|
46 |
## Prompt Structure
|
47 |
|
@@ -97,7 +97,7 @@ This model is released under the Apache-2.0 license.
|
|
97 |
|
98 |
## Acknowledgments
|
99 |
|
100 |
-
Special thanks to the Unsloth team for optimizing the fine-tuning pipeline and to Hugging Face
|
101 |
|
102 |
## Security & Format Considerations
|
103 |
|
|
|
41 |
- Reasoning & Self-Reflection: The model first decides if reasoning is necessary and then either provides step-by-step logic or directly answers the question.
|
42 |
- Structured Output: Responses follow a strict format with `<think>`, `<reflection>`, and `<answer>` sections, ensuring clarity and interpretability.
|
43 |
- Optimized Training: Trained using GRPO (Guided Reward Policy Optimization) to enforce structured responses and improve decision-making.
|
44 |
+
- Efficient Inference: Fine-tuned with Unsloth & Hugging Face's TRL, ensuring faster inference speeds and optimized resource utilization.
|
45 |
|
46 |
## Prompt Structure
|
47 |
|
|
|
97 |
|
98 |
## Acknowledgments
|
99 |
|
100 |
+
Special thanks to the Unsloth team for optimizing the fine-tuning pipeline and to Hugging Face's TRL for enabling advanced fine-tuning techniques.
|
101 |
|
102 |
## Security & Format Considerations
|
103 |
|