remove arm quants for now
Browse files
README.md
CHANGED
@@ -52,9 +52,6 @@ You are a world-class AI system, capable of complex reasoning and reflection. Re
|
|
52 |
| [Reflection-Llama-3.1-70B-Q4_K_M.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_K_M.gguf) | Q4_K_M | 42.52GB | false | Good quality, default size for must use cases, *recommended*. |
|
53 |
| [Reflection-Llama-3.1-70B-Q4_K_S.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_K_S.gguf) | Q4_K_S | 40.35GB | false | Slightly lower quality with more space savings, *recommended*. |
|
54 |
| [Reflection-Llama-3.1-70B-Q4_0.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_0.gguf) | Q4_0 | 40.12GB | false | Legacy format, generally not worth using over similarly sized formats |
|
55 |
-
| [Reflection-Llama-3.1-70B-Q4_0_8_8.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_0_8_8.gguf) | Q4_0_8_8 | 39.97GB | false | Optimized for ARM inference. Requires 'sve' support (see link below). |
|
56 |
-
| [Reflection-Llama-3.1-70B-Q4_0_4_8.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_0_4_8.gguf) | Q4_0_4_8 | 39.97GB | false | Optimized for ARM inference. Requires 'i8mm' support (see link below). |
|
57 |
-
| [Reflection-Llama-3.1-70B-Q4_0_4_4.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_0_4_4.gguf) | Q4_0_4_4 | 39.97GB | false | Optimized for ARM inference. Should work well on all ARM chips, pick this if you're unsure. |
|
58 |
| [Reflection-Llama-3.1-70B-Q3_K_XL.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q3_K_XL.gguf) | Q3_K_XL | 38.06GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
|
59 |
| [Reflection-Llama-3.1-70B-IQ4_XS.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-IQ4_XS.gguf) | IQ4_XS | 37.90GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
|
60 |
| [Reflection-Llama-3.1-70B-Q3_K_L.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q3_K_L.gguf) | Q3_K_L | 37.14GB | false | Lower quality but usable, good for low RAM availability. |
|
@@ -97,12 +94,6 @@ huggingface-cli download bartowski/Reflection-Llama-3.1-70B-GGUF --include "Refl
|
|
97 |
|
98 |
You can either specify a new local-dir (Reflection-Llama-3.1-70B-Q8_0) or download them all in place (./)
|
99 |
|
100 |
-
## Q4_0_X_X
|
101 |
-
|
102 |
-
If you're using an ARM chip, the Q4_0_X_X quants will have a substantial speedup. Check out Q4_0_4_4 speed comparisons [on the original pull request](https://github.com/ggerganov/llama.cpp/pull/5780#pullrequestreview-21657544660)
|
103 |
-
|
104 |
-
To check which one would work best for your ARM chip, you can check [AArch64 SoC features](https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html) (thanks EloyOn!).
|
105 |
-
|
106 |
## Which file should I choose?
|
107 |
|
108 |
A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
|
|
|
52 |
| [Reflection-Llama-3.1-70B-Q4_K_M.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_K_M.gguf) | Q4_K_M | 42.52GB | false | Good quality, default size for must use cases, *recommended*. |
|
53 |
| [Reflection-Llama-3.1-70B-Q4_K_S.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_K_S.gguf) | Q4_K_S | 40.35GB | false | Slightly lower quality with more space savings, *recommended*. |
|
54 |
| [Reflection-Llama-3.1-70B-Q4_0.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_0.gguf) | Q4_0 | 40.12GB | false | Legacy format, generally not worth using over similarly sized formats |
|
|
|
|
|
|
|
55 |
| [Reflection-Llama-3.1-70B-Q3_K_XL.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q3_K_XL.gguf) | Q3_K_XL | 38.06GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
|
56 |
| [Reflection-Llama-3.1-70B-IQ4_XS.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-IQ4_XS.gguf) | IQ4_XS | 37.90GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
|
57 |
| [Reflection-Llama-3.1-70B-Q3_K_L.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q3_K_L.gguf) | Q3_K_L | 37.14GB | false | Lower quality but usable, good for low RAM availability. |
|
|
|
94 |
|
95 |
You can either specify a new local-dir (Reflection-Llama-3.1-70B-Q8_0) or download them all in place (./)
|
96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
97 |
## Which file should I choose?
|
98 |
|
99 |
A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
|
Reflection-Llama-3.1-70B-Q4_0_4_4.gguf
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:11c35f5544ae4448f320537de4798a3933358cc068debc36739ad2be9cab951f
|
3 |
-
size 39969801152
|
|
|
|
|
|
|
|
Reflection-Llama-3.1-70B-Q4_0_4_8.gguf
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:d8e8bebaa384f36ee9f09b1b35dd8b76271052b15184e11fab7e4ef187ed51c9
|
3 |
-
size 39969801152
|
|
|
|
|
|
|
|
Reflection-Llama-3.1-70B-Q4_0_8_8.gguf
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:ac6852dbd35611e390d781569ac3a1bdebbb1cf83be970f11717af099964bf36
|
3 |
-
size 39969801152
|
|
|
|
|
|
|
|