bartowski commited on
Commit
f2d1ecc
·
verified ·
1 Parent(s): a8dfa34

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +19 -23
README.md CHANGED
@@ -1,28 +1,11 @@
1
  ---
2
- base_model: mistralai/Mistral-Large-Instruct-2407
3
- language:
4
- - en
5
- - fr
6
- - de
7
- - es
8
- - it
9
- - pt
10
- - zh
11
- - ja
12
- - ru
13
- - ko
14
- license: other
15
- license_name: mrl
16
- license_link: https://mistral.ai/licenses/MRL-0.1.md
17
- pipeline_tag: text-generation
18
  quantized_by: bartowski
19
- extra_gated_description: If you want to learn more about how we process your personal
20
- data, please read our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
21
  ---
22
 
23
  ## Llamacpp imatrix Quantizations of Mistral-Large-Instruct-2407
24
 
25
- Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3460">b3460</a> for quantization.
26
 
27
  Original model: https://huggingface.co/mistralai/Mistral-Large-Instruct-2407
28
 
@@ -33,9 +16,13 @@ Run them in [LM Studio](https://lmstudio.ai/)
33
  ## Prompt format
34
 
35
  ```
36
- <s>[INST] {prompt}[/INST] </s>
37
  ```
38
 
 
 
 
 
39
  ## Download a file (not the whole branch) from below:
40
 
41
  | Filename | Quant type | File Size | Split | Description |
@@ -43,10 +30,11 @@ Run them in [LM Studio](https://lmstudio.ai/)
43
  | [Mistral-Large-Instruct-2407-Q8_0.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q8_0) | Q8_0 | 130.28GB | true | Extremely high quality, generally unneeded but max available quant. |
44
  | [Mistral-Large-Instruct-2407-Q6_K.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q6_K) | Q6_K | 100.59GB | true | Very high quality, near perfect, *recommended*. |
45
  | [Mistral-Large-Instruct-2407-Q5_K_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q5_K_M) | Q5_K_M | 86.49GB | true | High quality, *recommended*. |
46
- | [Mistral-Large-Instruct-2407-Q5_K_S.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q5_K_S) | Q5_K_S | 84.36GB | true | High quality, *recommended*. |
47
  | [Mistral-Large-Instruct-2407-Q4_K_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q4_K_M) | Q4_K_M | 73.22GB | true | Good quality, default size for must use cases, *recommended*. |
48
- | [Mistral-Large-Instruct-2407-IQ4_XS.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-IQ4_XS) | IQ4_XS | 65.43GB | true | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
49
  | [Mistral-Large-Instruct-2407-Q4_K_S.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q4_K_S) | Q4_K_S | 69.57GB | true | Slightly lower quality with more space savings, *recommended*. |
 
 
 
50
  | [Mistral-Large-Instruct-2407-Q3_K_XL.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q3_K_XL) | Q3_K_XL | 64.91GB | true | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
51
  | [Mistral-Large-Instruct-2407-Q3_K_L.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q3_K_L) | Q3_K_L | 64.55GB | true | Lower quality but usable, good for low RAM availability. |
52
  | [Mistral-Large-Instruct-2407-Q3_K_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q3_K_M) | Q3_K_M | 59.10GB | true | Low quality. |
@@ -60,6 +48,14 @@ Run them in [LM Studio](https://lmstudio.ai/)
60
  | [Mistral-Large-Instruct-2407-IQ2_XXS.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/blob/main/Mistral-Large-Instruct-2407-IQ2_XXS.gguf) | IQ2_XXS | 32.43GB | false | Very low quality, uses SOTA techniques to be usable. |
61
  | [Mistral-Large-Instruct-2407-IQ1_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/blob/main/Mistral-Large-Instruct-2407-IQ1_M.gguf) | IQ1_M | 28.39GB | false | Extremely low quality, *not* recommended. |
62
 
 
 
 
 
 
 
 
 
63
  ## Credits
64
 
65
  Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset
@@ -83,7 +79,7 @@ huggingface-cli download bartowski/Mistral-Large-Instruct-2407-GGUF --include "M
83
  If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
84
 
85
  ```
86
- huggingface-cli download bartowski/Mistral-Large-Instruct-2407-GGUF --include "Mistral-Large-Instruct-2407-Q8_0.gguf/*" --local-dir Mistral-Large-Instruct-2407-Q8_0
87
  ```
88
 
89
  You can either specify a new local-dir (Mistral-Large-Instruct-2407-Q8_0) or download them all in place (./)
 
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  quantized_by: bartowski
3
+ pipeline_tag: text-generation
 
4
  ---
5
 
6
  ## Llamacpp imatrix Quantizations of Mistral-Large-Instruct-2407
7
 
8
+ Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3634">b3634</a> for quantization.
9
 
10
  Original model: https://huggingface.co/mistralai/Mistral-Large-Instruct-2407
11
 
 
16
  ## Prompt format
17
 
18
  ```
19
+ <s>[INST] {prompt}[/INST]
20
  ```
21
 
22
+ ## What's new:
23
+
24
+ Add chat template, some new sizes
25
+
26
  ## Download a file (not the whole branch) from below:
27
 
28
  | Filename | Quant type | File Size | Split | Description |
 
30
  | [Mistral-Large-Instruct-2407-Q8_0.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q8_0) | Q8_0 | 130.28GB | true | Extremely high quality, generally unneeded but max available quant. |
31
  | [Mistral-Large-Instruct-2407-Q6_K.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q6_K) | Q6_K | 100.59GB | true | Very high quality, near perfect, *recommended*. |
32
  | [Mistral-Large-Instruct-2407-Q5_K_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q5_K_M) | Q5_K_M | 86.49GB | true | High quality, *recommended*. |
 
33
  | [Mistral-Large-Instruct-2407-Q4_K_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q4_K_M) | Q4_K_M | 73.22GB | true | Good quality, default size for must use cases, *recommended*. |
 
34
  | [Mistral-Large-Instruct-2407-Q4_K_S.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q4_K_S) | Q4_K_S | 69.57GB | true | Slightly lower quality with more space savings, *recommended*. |
35
+ | [Mistral-Large-Instruct-2407-Q4_0.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q4_0) | Q4_0 | 69.32GB | true | Legacy format, generally not worth using over similarly sized formats |
36
+ | [Mistral-Large-Instruct-2407-Q4_0_4_4.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q4_0_4_4) | Q4_0_4_4 | 69.08GB | true | Optimized for ARM and CPU inference, much faster than Q4_0 at similar quality. |
37
+ | [Mistral-Large-Instruct-2407-IQ4_XS.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-IQ4_XS) | IQ4_XS | 65.43GB | true | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
38
  | [Mistral-Large-Instruct-2407-Q3_K_XL.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q3_K_XL) | Q3_K_XL | 64.91GB | true | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
39
  | [Mistral-Large-Instruct-2407-Q3_K_L.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q3_K_L) | Q3_K_L | 64.55GB | true | Lower quality but usable, good for low RAM availability. |
40
  | [Mistral-Large-Instruct-2407-Q3_K_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/tree/main/Mistral-Large-Instruct-2407-Q3_K_M) | Q3_K_M | 59.10GB | true | Low quality. |
 
48
  | [Mistral-Large-Instruct-2407-IQ2_XXS.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/blob/main/Mistral-Large-Instruct-2407-IQ2_XXS.gguf) | IQ2_XXS | 32.43GB | false | Very low quality, uses SOTA techniques to be usable. |
49
  | [Mistral-Large-Instruct-2407-IQ1_M.gguf](https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/blob/main/Mistral-Large-Instruct-2407-IQ1_M.gguf) | IQ1_M | 28.39GB | false | Extremely low quality, *not* recommended. |
50
 
51
+ ## Embed/output weights
52
+
53
+ Some of these quants (Q3_K_XL, Q4_K_L etc) are the standard quantization method with the embeddings and output weights quantized to Q8_0 instead of what they would normally default to.
54
+
55
+ Some say that this improves the quality, others don't notice any difference. If you use these models PLEASE COMMENT with your findings. I would like feedback that these are actually used and useful so I don't keep uploading quants no one is using.
56
+
57
+ Thanks!
58
+
59
  ## Credits
60
 
61
  Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset
 
79
  If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
80
 
81
  ```
82
+ huggingface-cli download bartowski/Mistral-Large-Instruct-2407-GGUF --include "Mistral-Large-Instruct-2407-Q8_0/*" --local-dir ./
83
  ```
84
 
85
  You can either specify a new local-dir (Mistral-Large-Instruct-2407-Q8_0) or download them all in place (./)