lemonilia commited on
Commit
db83dba
·
verified ·
1 Parent(s): 1aae992

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -17,7 +17,7 @@ Prepend the assistant response with `<think>` to make the model engage in a chai
17
 
18
  Make sure that the model's output length is long enough; be prepared to make the it continue its response if it stops prematurely.
19
 
20
- Low-depth instructions (perhaps at depth-0, just before the assistant's rsponse) can be beneficial in steering how the model should think. An additional `[SYSTEM_PROMPT]` could be used there.
21
 
22
  From tests it seems beneficial to keep at least one chain-of-thought in the context in addition to the one being generated. More experimentation required here.
23
 
@@ -30,7 +30,8 @@ Chain of thought.
30
  Model response.</s>
31
  ```
32
 
33
- ## Known quirks and issues
 
34
  - Not really a true issue, but information in the system prompt that contradicts that in the chat history or that is meant to be only temporarily valid may cause coherency issues with this model, since it will follow instructions very precisely.
35
  - Without user control on chain-of-thought length, the model can ramble for several thousands of tokens.
36
  - Besides multi-turn capabilities, other non-reasoning capabilities of the original `Mistral-Small-24B-Instruct-2501` model might have degraded.
@@ -39,7 +40,8 @@ Model response.</s>
39
  # What's in this repository
40
  - Checkpoints for epochs 1~5
41
  - LoRA adapter for the final model
42
- - Some static GGUF quantizations
 
43
 
44
  ## Dataset
45
  Almost the entirety of the [s1K dataset](https://huggingface.co/datasets/simplescaling/s1K) was used with minimal modifications to make it properly work with Mistral-Small-3-Instruct, except 4 rows that didn't fit within the training sequence length of 8192 tokens and 16 of the shortest ones that have been used as the test set instead. No samples have been clipped and no system prompt was added. All were single-turn.
 
17
 
18
  Make sure that the model's output length is long enough; be prepared to make the it continue its response if it stops prematurely.
19
 
20
+ Low-depth instructions (perhaps at depth-0, just before the assistant's response) can be beneficial in steering how the model should think. An additional `[SYSTEM_PROMPT]` could be used there.
21
 
22
  From tests it seems beneficial to keep at least one chain-of-thought in the context in addition to the one being generated. More experimentation required here.
23
 
 
30
  Model response.</s>
31
  ```
32
 
33
+ ## Observed quirks and issues
34
+ - Assistant responses may lack the final punctuation mark as a probable result of how the source training data was formatted.
35
  - Not really a true issue, but information in the system prompt that contradicts that in the chat history or that is meant to be only temporarily valid may cause coherency issues with this model, since it will follow instructions very precisely.
36
  - Without user control on chain-of-thought length, the model can ramble for several thousands of tokens.
37
  - Besides multi-turn capabilities, other non-reasoning capabilities of the original `Mistral-Small-24B-Instruct-2501` model might have degraded.
 
40
  # What's in this repository
41
  - Checkpoints for epochs 1~5
42
  - LoRA adapter for the final model
43
+ - Static GGUF quantizations
44
+ - HF FP16 weights
45
 
46
  ## Dataset
47
  Almost the entirety of the [s1K dataset](https://huggingface.co/datasets/simplescaling/s1K) was used with minimal modifications to make it properly work with Mistral-Small-3-Instruct, except 4 rows that didn't fit within the training sequence length of 8192 tokens and 16 of the shortest ones that have been used as the test set instead. No samples have been clipped and no system prompt was added. All were single-turn.