eaddario commited on
Commit
b3920d4
·
unverified ·
1 Parent(s): 343d83f

Update README

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -26,10 +26,10 @@ At its core, an Importance Matrix (imatrix) is a table or, more broadly, a struc
26
  The process to produce the quantized [GGUF](https://huggingface.co/docs/hub/en/gguf) models is roughly as follows:
27
 
28
  1. Convert the the original model's safetensors into GGUF F16*
29
- 2. Estimate the Perplexity score for the F16 model (base) using [wikitext-2-raw-v1](https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-2-raw-v1), and record the [logits](./logits/)
30
- 3. Generate the [imatrix](./imatrix/) for each calibration dataset
31
  4. Create quantized versions of the base model using each imatrix per quant type
32
- 5. Calculate the Perplexity and KL Divergence scores for each quantized model [(logs)](./scores/)
33
  6. For each quant type, keep the version with the best (usually the lowest) scores
34
 
35
  *[BF16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) would be preferred, but Apple's GPUs don't support it yet, and therefore any operations are executed in the CPU, making it unacceptably slow. This is expected to change in the near term but until then, if you are using Apple kit avoid using any models tagged BF16
 
26
  The process to produce the quantized [GGUF](https://huggingface.co/docs/hub/en/gguf) models is roughly as follows:
27
 
28
  1. Convert the the original model's safetensors into GGUF F16*
29
+ 2. Estimate the Perplexity score for the F16 model (base) using [wikitext-2-raw-v1](https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-2-raw-v1), and record the [logits](https://huggingface.co/eaddario/Watt-Tool-8B-GGUF/tree/main/logits)
30
+ 3. Generate the [imatrix](https://huggingface.co/eaddario/Watt-Tool-8B-GGUF/tree/main/imatrix) for each calibration dataset
31
  4. Create quantized versions of the base model using each imatrix per quant type
32
+ 5. Calculate the Perplexity and KL Divergence scores for each quantized model [(logs)](https://huggingface.co/eaddario/Watt-Tool-8B-GGUF/tree/main/scores)
33
  6. For each quant type, keep the version with the best (usually the lowest) scores
34
 
35
  *[BF16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) would be preferred, but Apple's GPUs don't support it yet, and therefore any operations are executed in the CPU, making it unacceptably slow. This is expected to change in the near term but until then, if you are using Apple kit avoid using any models tagged BF16