bartowski
/

Reflection-Llama-3.1-70B-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

bartowski commited on Sep 6, 2024

Commit

aab43ae

·

verified ·

1 Parent(s): 9f00b00

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -10,6 +10,10 @@ quantized_by: bartowski
 <b>Yes, this is with the fix to the tokenizer!</b>
 Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3658">b3658</a> for quantization.
 Original model: https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B

 <b>Yes, this is with the fix to the tokenizer!</b>
+If you want to make sure it's using the thought and output tokens, be sure to enable rendering of special tokens (in llama.cpp this is the `--special` tag)
+It is able to use them without rendering them, much like chat tokens, this will just let you *see* them as they're getting used by the model.
 Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3658">b3658</a> for quantization.
 Original model: https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B