eaddario
/

Watt-Tool-8B-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

eaddario commited on 27 days ago

Commit

b3920d4

·

unverified ·

1 Parent(s): 343d83f

Update README

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -26,10 +26,10 @@ At its core, an Importance Matrix (imatrix) is a table or, more broadly, a struc
 The process to produce the quantized [GGUF](https://huggingface.co/docs/hub/en/gguf) models is roughly as follows:
 1. Convert the the original model's safetensors into GGUF F16*
-2. Estimate the Perplexity score for the F16 model (base) using [wikitext-2-raw-v1](https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-2-raw-v1), and record the [logits](./logits/)
-3. Generate the [imatrix](./imatrix/) for each calibration dataset
 4. Create quantized versions of the base model using each imatrix per quant type
-5. Calculate the Perplexity and KL Divergence scores for each quantized model [(logs)](./scores/)
 6. For each quant type, keep the version with the best (usually the lowest) scores
 *[BF16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) would be preferred, but Apple's GPUs don't support it yet, and therefore any operations are executed in the CPU, making it unacceptably slow. This is expected to change in the near term but until then, if you are using Apple kit avoid using any models tagged BF16

 The process to produce the quantized [GGUF](https://huggingface.co/docs/hub/en/gguf) models is roughly as follows:
 1. Convert the the original model's safetensors into GGUF F16*
+2. Estimate the Perplexity score for the F16 model (base) using [wikitext-2-raw-v1](https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-2-raw-v1), and record the [logits](https://huggingface.co/eaddario/Watt-Tool-8B-GGUF/tree/main/logits)
+3. Generate the [imatrix](https://huggingface.co/eaddario/Watt-Tool-8B-GGUF/tree/main/imatrix) for each calibration dataset
 4. Create quantized versions of the base model using each imatrix per quant type
+5. Calculate the Perplexity and KL Divergence scores for each quantized model [(logs)](https://huggingface.co/eaddario/Watt-Tool-8B-GGUF/tree/main/scores)
 6. For each quant type, keep the version with the best (usually the lowest) scores
 *[BF16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) would be preferred, but Apple's GPUs don't support it yet, and therefore any operations are executed in the CPU, making it unacceptably slow. This is expected to change in the near term but until then, if you are using Apple kit avoid using any models tagged BF16