bartowski
/

Mistral-Large-Instruct-2407-GGUF

Text Generation

GGUF

Inference Endpoints

imatrix

conversational

Model card Files Files and versions Community

bartowski commited on Aug 27, 2024

Commit

f2d1ecc

verified ·

1 Parent(s): a8dfa34

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +19 -23

README.md CHANGED Viewed

@@ -1,28 +1,11 @@
 ---
-base_model: mistralai/Mistral-Large-Instruct-2407
-language:
-- en
-- fr
-- de
-- es
-- it
-- pt
-- zh
-- ja
-- ru
-- ko
-license: other
-license_name: mrl
-license_link: https://mistral.ai/licenses/MRL-0.1.md
-pipeline_tag: text-generation
 quantized_by: bartowski
-extra_gated_description: If you want to learn more about how we process your personal
-  data, please read our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
 ---
 ## Llamacpp imatrix Quantizations of Mistral-Large-Instruct-2407
-Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3460">b3460</a> for quantization.
 Original model: https://huggingface.co/mistralai/Mistral-Large-Instruct-2407
@@ -33,9 +16,13 @@ Run them in [LM Studio](https://lmstudio.ai/)
 ## Prompt format
 ```
-<s>[INST] {prompt}[/INST] </s>
 ```
 ## Download a file (not the whole branch) from below:
 | Filename | Quant type | File Size | Split | Description |
@@ -43,10 +30,11 @@ Run them in [LM Studio](https://lmstudio.ai/)
 | [Mistral-Large-Instruct-2407-Q8_0.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q8_0) | Q8_0 | 130.28GB | true | Extremely high quality, generally unneeded but max available quant. |
 | [Mistral-Large-Instruct-2407-Q6_K.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q6_K) | Q6_K | 100.59GB | true | Very high quality, near perfect, *recommended*. |
 | [Mistral-Large-Instruct-2407-Q5_K_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q5_K_M) | Q5_K_M | 86.49GB | true | High quality, *recommended*. |
-| [Mistral-Large-Instruct-2407-Q5_K_S.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q5_K_S) | Q5_K_S | 84.36GB | true | High quality, *recommended*. |
 | [Mistral-Large-Instruct-2407-Q4_K_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q4_K_M) | Q4_K_M | 73.22GB | true | Good quality, default size for must use cases, *recommended*. |
-| [Mistral-Large-Instruct-2407-IQ4_XS.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-IQ4_XS) | IQ4_XS | 65.43GB | true | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
 | [Mistral-Large-Instruct-2407-Q4_K_S.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q4_K_S) | Q4_K_S | 69.57GB | true | Slightly lower quality with more space savings, *recommended*. |
 | [Mistral-Large-Instruct-2407-Q3_K_XL.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q3_K_XL) | Q3_K_XL | 64.91GB | true | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
 | [Mistral-Large-Instruct-2407-Q3_K_L.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q3_K_L) | Q3_K_L | 64.55GB | true | Lower quality but usable, good for low RAM availability. |
 | [Mistral-Large-Instruct-2407-Q3_K_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q3_K_M) | Q3_K_M | 59.10GB | true | Low quality. |
@@ -60,6 +48,14 @@ Run them in [LM Studio](https://lmstudio.ai/)
 | [Mistral-Large-Instruct-2407-IQ2_XXS.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/blob/main/Mistral-Large-Instruct-2407-IQ2_XXS.gguf) | IQ2_XXS | 32.43GB | false | Very low quality, uses SOTA techniques to be usable. |
 | [Mistral-Large-Instruct-2407-IQ1_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/blob/main/Mistral-Large-Instruct-2407-IQ1_M.gguf) | IQ1_M | 28.39GB | false | Extremely low quality, *not* recommended. |
 ## Credits
 Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset
@@ -83,7 +79,7 @@ huggingface-cli download bartowski/Mistral-Large-Instruct-2407-GGUF --include "M
 If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
 ```
-huggingface-cli download bartowski/Mistral-Large-Instruct-2407-GGUF --include "Mistral-Large-Instruct-2407-Q8_0.gguf/*" --local-dir Mistral-Large-Instruct-2407-Q8_0
 ```
 You can either specify a new local-dir (Mistral-Large-Instruct-2407-Q8_0) or download them all in place (./)

 ---
 quantized_by: bartowski
+pipeline_tag: text-generation
 ---
 ## Llamacpp imatrix Quantizations of Mistral-Large-Instruct-2407
+Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3634">b3634</a> for quantization.
 Original model: https://huggingface.co/mistralai/Mistral-Large-Instruct-2407
 ## Prompt format
 ```
+<s>[INST] {prompt}[/INST]
 ```
+## What's new:
+Add chat template, some new sizes
 ## Download a file (not the whole branch) from below:
 | Filename | Quant type | File Size | Split | Description |
 | [Mistral-Large-Instruct-2407-Q8_0.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q8_0) | Q8_0 | 130.28GB | true | Extremely high quality, generally unneeded but max available quant. |
 | [Mistral-Large-Instruct-2407-Q6_K.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q6_K) | Q6_K | 100.59GB | true | Very high quality, near perfect, *recommended*. |
 | [Mistral-Large-Instruct-2407-Q5_K_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q5_K_M) | Q5_K_M | 86.49GB | true | High quality, *recommended*. |
 | [Mistral-Large-Instruct-2407-Q4_K_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q4_K_M) | Q4_K_M | 73.22GB | true | Good quality, default size for must use cases, *recommended*. |
 | [Mistral-Large-Instruct-2407-Q4_K_S.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q4_K_S) | Q4_K_S | 69.57GB | true | Slightly lower quality with more space savings, *recommended*. |
+| [Mistral-Large-Instruct-2407-Q4_0.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q4_0) | Q4_0 | 69.32GB | true | Legacy format, generally not worth using over similarly sized formats |
+| [Mistral-Large-Instruct-2407-Q4_0_4_4.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q4_0_4_4) | Q4_0_4_4 | 69.08GB | true | Optimized for ARM and CPU inference, much faster than Q4_0 at similar quality. |
+| [Mistral-Large-Instruct-2407-IQ4_XS.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-IQ4_XS) | IQ4_XS | 65.43GB | true | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
 | [Mistral-Large-Instruct-2407-Q3_K_XL.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q3_K_XL) | Q3_K_XL | 64.91GB | true | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
 | [Mistral-Large-Instruct-2407-Q3_K_L.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q3_K_L) | Q3_K_L | 64.55GB | true | Lower quality but usable, good for low RAM availability. |
 | [Mistral-Large-Instruct-2407-Q3_K_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q3_K_M) | Q3_K_M | 59.10GB | true | Low quality. |
 | [Mistral-Large-Instruct-2407-IQ2_XXS.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/blob/main/Mistral-Large-Instruct-2407-IQ2_XXS.gguf) | IQ2_XXS | 32.43GB | false | Very low quality, uses SOTA techniques to be usable. |
 | [Mistral-Large-Instruct-2407-IQ1_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/blob/main/Mistral-Large-Instruct-2407-IQ1_M.gguf) | IQ1_M | 28.39GB | false | Extremely low quality, *not* recommended. |
+## Embed/output weights
+Some of these quants (Q3_K_XL, Q4_K_L etc) are the standard quantization method with the embeddings and output weights quantized to Q8_0 instead of what they would normally default to.
+Some say that this improves the quality, others don't notice any difference. If you use these models PLEASE COMMENT with your findings. I would like feedback that these are actually used and useful so I don't keep uploading quants no one is using.
+Thanks!
 ## Credits
 Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset
 If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
 ```
+huggingface-cli download bartowski/Mistral-Large-Instruct-2407-GGUF --include "Mistral-Large-Instruct-2407-Q8_0/*" --local-dir ./
 ```
 You can either specify a new local-dir (Mistral-Large-Instruct-2407-Q8_0) or download them all in place (./)