lemonilia
/

Mistral-Small-3-Reasoner-s1

Safetensors

GGUF

English

Inference Endpoints

conversational

Model card Files Files and versions Community

lemonilia commited on 6 days ago

Commit

db83dba

verified ·

1 Parent(s): 1aae992

Update README.md

Browse files

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ Prepend the assistant response with `<think>` to make the model engage in a chai
 Make sure that the model's output length is long enough; be prepared to make the it continue its response if it stops prematurely.
-Low-depth instructions (perhaps at depth-0, just before the assistant's rsponse) can be beneficial in steering how the model should think. An additional `[SYSTEM_PROMPT]` could be used there.
 From tests it seems beneficial to keep at least one chain-of-thought in the context in addition to the one being generated. More experimentation required here.
@@ -30,7 +30,8 @@ Chain of thought.
 Model response.</s>
 ```
-## Known quirks and issues
 - Not really a true issue, but information in the system prompt that contradicts that in the chat history or that is meant to be only temporarily valid may cause coherency issues with this model, since it will follow instructions very precisely.
 - Without user control on chain-of-thought length, the model can ramble for several thousands of tokens.
 - Besides multi-turn capabilities, other non-reasoning capabilities of the original `Mistral-Small-24B-Instruct-2501` model might have degraded.
@@ -39,7 +40,8 @@ Model response.</s>
 # What's in this repository
 - Checkpoints for epochs 1~5
 - LoRA adapter for the final model
-- Some static GGUF quantizations
 ## Dataset
 Almost the entirety of the [s1K dataset](https://huggingface.co/datasets/simplescaling/s1K) was used with minimal modifications to make it properly work with Mistral-Small-3-Instruct, except 4 rows that didn't fit within the training sequence length of 8192 tokens and 16 of the shortest ones that have been used as the test set instead. No samples have been clipped and no system prompt was added. All were single-turn.

 Make sure that the model's output length is long enough; be prepared to make the it continue its response if it stops prematurely.
+Low-depth instructions (perhaps at depth-0, just before the assistant's response) can be beneficial in steering how the model should think. An additional `[SYSTEM_PROMPT]` could be used there.
 From tests it seems beneficial to keep at least one chain-of-thought in the context in addition to the one being generated. More experimentation required here.
 Model response.</s>
 ```
+## Observed quirks and issues
+- Assistant responses may lack the final punctuation mark as a probable result of how the source training data was formatted.
 - Not really a true issue, but information in the system prompt that contradicts that in the chat history or that is meant to be only temporarily valid may cause coherency issues with this model, since it will follow instructions very precisely.
 - Without user control on chain-of-thought length, the model can ramble for several thousands of tokens.
 - Besides multi-turn capabilities, other non-reasoning capabilities of the original `Mistral-Small-24B-Instruct-2501` model might have degraded.
 # What's in this repository
 - Checkpoints for epochs 1~5
 - LoRA adapter for the final model
+- Static GGUF quantizations
+- HF FP16 weights
 ## Dataset
 Almost the entirety of the [s1K dataset](https://huggingface.co/datasets/simplescaling/s1K) was used with minimal modifications to make it properly work with Mistral-Small-3-Instruct, except 4 rows that didn't fit within the training sequence length of 8192 tokens and 16 of the shortest ones that have been used as the test set instead. No samples have been clipped and no system prompt was added. All were single-turn.