alpha-ai
/

Reason-With-Choice-3B-GGUF

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

alphaaico commited on 25 days ago

Commit

d4b0b57

·

verified ·

1 Parent(s): ab7a928

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -41,7 +41,7 @@ Think about it: most AI models blindly generate reasoning even when unnecessary,
 - Reasoning & Self-Reflection: The model first decides if reasoning is necessary and then either provides step-by-step logic or directly answers the question.
 - Structured Output: Responses follow a strict format with `<think>`, `<reflection>`, and `<answer>` sections, ensuring clarity and interpretability.
 - Optimized Training: Trained using GRPO (Guided Reward Policy Optimization) to enforce structured responses and improve decision-making.
-- Efficient Inference: Fine-tuned with Unsloth & Hugging Face’s TRL, ensuring faster inference speeds and optimized resource utilization.
 ## Prompt Structure
@@ -97,7 +97,7 @@ This model is released under the Apache-2.0 license.
 ## Acknowledgments
-Special thanks to the Unsloth team for optimizing the fine-tuning pipeline and to Hugging Face’s TRL for enabling advanced fine-tuning techniques.
 ## Security & Format Considerations

 - Reasoning & Self-Reflection: The model first decides if reasoning is necessary and then either provides step-by-step logic or directly answers the question.
 - Structured Output: Responses follow a strict format with `<think>`, `<reflection>`, and `<answer>` sections, ensuring clarity and interpretability.
 - Optimized Training: Trained using GRPO (Guided Reward Policy Optimization) to enforce structured responses and improve decision-making.
+- Efficient Inference: Fine-tuned with Unsloth & Hugging Face's TRL, ensuring faster inference speeds and optimized resource utilization.
 ## Prompt Structure
 ## Acknowledgments
+Special thanks to the Unsloth team for optimizing the fine-tuning pipeline and to Hugging Face's TRL for enabling advanced fine-tuning techniques.
 ## Security & Format Considerations