asiansoul
/

Llama-3.1-SISaAI-Ko-merge-8B-Instruct-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

asiansoul commited on 14 days ago

Commit

23a9e4c

·

verified ·

1 Parent(s): 9e40547

Update README.md

Files changed (1) hide show

README.md +15 -2

README.md CHANGED Viewed

@@ -306,8 +306,21 @@ Third Eye Blind remains a beloved rock band with a dedicated fan base. Their mus
 | [GGUF](https://huggingface.co/mradermacher/Llama-3.1-SISaAI-Ko-merge-8B-Instruct-GGUF/resolve/main/Llama-3.1-SISaAI-Ko-merge-8B-Instruct.Q8_0.gguf) | Q8_0 | 8.6 | fast, best quality |
 | [GGUF](https://huggingface.co/mradermacher/Llama-3.1-SISaAI-Ko-merge-8B-Instruct-GGUF/resolve/main/Llama-3.1-SISaAI-Ko-merge-8B-Instruct.f16.gguf) | f16 | 16.2 | 16 bpw, overkill |
-Here is a handy graph by ikawrakow comparing some lower-quality quant
-types (lower is better):
 ![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)

 | [GGUF](https://huggingface.co/mradermacher/Llama-3.1-SISaAI-Ko-merge-8B-Instruct-GGUF/resolve/main/Llama-3.1-SISaAI-Ko-merge-8B-Instruct.Q8_0.gguf) | Q8_0 | 8.6 | fast, best quality |
 | [GGUF](https://huggingface.co/mradermacher/Llama-3.1-SISaAI-Ko-merge-8B-Instruct-GGUF/resolve/main/Llama-3.1-SISaAI-Ko-merge-8B-Instruct.f16.gguf) | f16 | 16.2 | 16 bpw, overkill |
+This graph compares the performance of various quantization methods, focusing on lower-quality quant types:
+X-axis (bpw): Bits per weight. Lower values mean higher compression.
+Y-axis (PPL(Q)/PPL(fp16)-1): Performance degradation of quantized models. Lower values mean less degradation.
+Methods Compared:
+Pre-imatrix k-quants: Older quantization methods.
+Pre-imatrix legacy quants: Traditional quantization methods.
+imatrix i- and k-quants: Modern quantization using advanced techniques.
+Key Insight: imatrix-based methods (i- and k-quants) show less performance degradation, especially at lower bpw (higher compression), making them more efficient than legacy methods. The graph helps choose the best method based on the desired balance between compression and performance.
 ![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)