asiansoul commited on
Commit
23a9e4c
·
verified ·
1 Parent(s): 9e40547

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -2
README.md CHANGED
@@ -306,8 +306,21 @@ Third Eye Blind remains a beloved rock band with a dedicated fan base. Their mus
306
  | [GGUF](https://huggingface.co/mradermacher/Llama-3.1-SISaAI-Ko-merge-8B-Instruct-GGUF/resolve/main/Llama-3.1-SISaAI-Ko-merge-8B-Instruct.Q8_0.gguf) | Q8_0 | 8.6 | fast, best quality |
307
  | [GGUF](https://huggingface.co/mradermacher/Llama-3.1-SISaAI-Ko-merge-8B-Instruct-GGUF/resolve/main/Llama-3.1-SISaAI-Ko-merge-8B-Instruct.f16.gguf) | f16 | 16.2 | 16 bpw, overkill |
308
 
309
- Here is a handy graph by ikawrakow comparing some lower-quality quant
310
- types (lower is better):
 
 
 
 
 
 
 
 
 
 
 
 
 
311
 
312
  ![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)
313
 
 
306
  | [GGUF](https://huggingface.co/mradermacher/Llama-3.1-SISaAI-Ko-merge-8B-Instruct-GGUF/resolve/main/Llama-3.1-SISaAI-Ko-merge-8B-Instruct.Q8_0.gguf) | Q8_0 | 8.6 | fast, best quality |
307
  | [GGUF](https://huggingface.co/mradermacher/Llama-3.1-SISaAI-Ko-merge-8B-Instruct-GGUF/resolve/main/Llama-3.1-SISaAI-Ko-merge-8B-Instruct.f16.gguf) | f16 | 16.2 | 16 bpw, overkill |
308
 
309
+ This graph compares the performance of various quantization methods, focusing on lower-quality quant types:
310
+
311
+ X-axis (bpw): Bits per weight. Lower values mean higher compression.
312
+
313
+ Y-axis (PPL(Q)/PPL(fp16)-1): Performance degradation of quantized models. Lower values mean less degradation.
314
+
315
+ Methods Compared:
316
+
317
+ Pre-imatrix k-quants: Older quantization methods.
318
+
319
+ Pre-imatrix legacy quants: Traditional quantization methods.
320
+
321
+ imatrix i- and k-quants: Modern quantization using advanced techniques.
322
+
323
+ Key Insight: imatrix-based methods (i- and k-quants) show less performance degradation, especially at lower bpw (higher compression), making them more efficient than legacy methods. The graph helps choose the best method based on the desired balance between compression and performance.
324
 
325
  ![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)
326