Update README.md
Browse files
README.md
CHANGED
@@ -11,6 +11,7 @@ I did not create that model, only discovered it and wanted to try it for myself,
|
|
11 |
[main](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/main) has measurements for default dataset and the one for [goliath-120b-exl2-rpcal](https://huggingface.co/Panchovix/goliath-120b-exl2-rpcal)
|
12 |
|
13 |
[2.65bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/2.65bpw) using default dataset
|
|
|
14 |
[4.35bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw) using default dataset
|
15 |
[4.35bpw-rpcal](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw-rpcal) using PIPPA dataset
|
16 |
|
@@ -21,14 +22,19 @@ I did not create that model, only discovered it and wanted to try it for myself,
|
|
21 |
context 16k, cache 16: 46.9GiB (fits in 2x 3090)
|
22 |
context 32k, cache 8: 47GiB (fits in 2x 3090)
|
23 |
|
|
|
|
|
|
|
|
|
24 |
### 4.35bpw
|
25 |
context 16k, cache 16: 70.1GiB (fits in 3x 3090)
|
26 |
context 32k, cache 8: 70.3GiB (fits in 3x 3090)
|
27 |
context 32k, cache 16: 78.7GiB (fits in A100 80GB)
|
28 |
|
29 |
# Super epic scientific test results
|
30 |
-
- The 2.65bpw version suffered greatly, it's not completely broken, but it's
|
31 |
-
-
|
|
|
32 |
- The version using the PIPPA dataset produces worse results than the one using the default dataset on any context length.
|
33 |
|
34 |
My current strategy is to use the original goliath until its context is full and then switch over to this one.
|
|
|
11 |
[main](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/main) has measurements for default dataset and the one for [goliath-120b-exl2-rpcal](https://huggingface.co/Panchovix/goliath-120b-exl2-rpcal)
|
12 |
|
13 |
[2.65bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/2.65bpw) using default dataset
|
14 |
+
[3bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/3bpw) using default dataset
|
15 |
[4.35bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw) using default dataset
|
16 |
[4.35bpw-rpcal](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw-rpcal) using PIPPA dataset
|
17 |
|
|
|
22 |
context 16k, cache 16: 46.9GiB (fits in 2x 3090)
|
23 |
context 32k, cache 8: 47GiB (fits in 2x 3090)
|
24 |
|
25 |
+
### 3bpw
|
26 |
+
context 8k, cache 16: 47.4GiB (fits in 2x 3090)
|
27 |
+
context 16k, cache 8: 47.4GiB (fits in 2x 3090)
|
28 |
+
|
29 |
### 4.35bpw
|
30 |
context 16k, cache 16: 70.1GiB (fits in 3x 3090)
|
31 |
context 32k, cache 8: 70.3GiB (fits in 3x 3090)
|
32 |
context 32k, cache 16: 78.7GiB (fits in A100 80GB)
|
33 |
|
34 |
# Super epic scientific test results
|
35 |
+
- The 2.65bpw version suffered greatly, it's not completely broken, but it's no good either.
|
36 |
+
- the 3bpw version hasn't suffered as much, it's much more usable than the 2.65bpw one.
|
37 |
+
- The 4.35bpw version is a bit worse than normal 4k goliath but better than goliath with rope scale applied for 8k+ context.
|
38 |
- The version using the PIPPA dataset produces worse results than the one using the default dataset on any context length.
|
39 |
|
40 |
My current strategy is to use the original goliath until its context is full and then switch over to this one.
|