aikitoria commited on
Commit
479de1e
·
verified ·
1 Parent(s): eeefc9b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -11,6 +11,7 @@ I did not create that model, only discovered it and wanted to try it for myself,
11
  [main](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/main) has measurements for default dataset and the one for [goliath-120b-exl2-rpcal](https://huggingface.co/Panchovix/goliath-120b-exl2-rpcal)
12
 
13
  [2.65bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/2.65bpw) using default dataset
 
14
  [4.35bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw) using default dataset
15
  [4.35bpw-rpcal](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw-rpcal) using PIPPA dataset
16
 
@@ -21,14 +22,19 @@ I did not create that model, only discovered it and wanted to try it for myself,
21
  context 16k, cache 16: 46.9GiB (fits in 2x 3090)
22
  context 32k, cache 8: 47GiB (fits in 2x 3090)
23
 
 
 
 
 
24
  ### 4.35bpw
25
  context 16k, cache 16: 70.1GiB (fits in 3x 3090)
26
  context 32k, cache 8: 70.3GiB (fits in 3x 3090)
27
  context 32k, cache 16: 78.7GiB (fits in A100 80GB)
28
 
29
  # Super epic scientific test results
30
- - The 2.65bpw version suffered greatly, it's not completely broken, but it's not good either.
31
- - The 4.35bpw version is worse than normal 4k goliath but better than goliath with rope scale applied for 8k+ context.
 
32
  - The version using the PIPPA dataset produces worse results than the one using the default dataset on any context length.
33
 
34
  My current strategy is to use the original goliath until its context is full and then switch over to this one.
 
11
  [main](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/main) has measurements for default dataset and the one for [goliath-120b-exl2-rpcal](https://huggingface.co/Panchovix/goliath-120b-exl2-rpcal)
12
 
13
  [2.65bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/2.65bpw) using default dataset
14
+ [3bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/3bpw) using default dataset
15
  [4.35bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw) using default dataset
16
  [4.35bpw-rpcal](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw-rpcal) using PIPPA dataset
17
 
 
22
  context 16k, cache 16: 46.9GiB (fits in 2x 3090)
23
  context 32k, cache 8: 47GiB (fits in 2x 3090)
24
 
25
+ ### 3bpw
26
+ context 8k, cache 16: 47.4GiB (fits in 2x 3090)
27
+ context 16k, cache 8: 47.4GiB (fits in 2x 3090)
28
+
29
  ### 4.35bpw
30
  context 16k, cache 16: 70.1GiB (fits in 3x 3090)
31
  context 32k, cache 8: 70.3GiB (fits in 3x 3090)
32
  context 32k, cache 16: 78.7GiB (fits in A100 80GB)
33
 
34
  # Super epic scientific test results
35
+ - The 2.65bpw version suffered greatly, it's not completely broken, but it's no good either.
36
+ - the 3bpw version hasn't suffered as much, it's much more usable than the 2.65bpw one.
37
+ - The 4.35bpw version is a bit worse than normal 4k goliath but better than goliath with rope scale applied for 8k+ context.
38
  - The version using the PIPPA dataset produces worse results than the one using the default dataset on any context length.
39
 
40
  My current strategy is to use the original goliath until its context is full and then switch over to this one.