Update README.md
Browse files
README.md
CHANGED
@@ -12,6 +12,8 @@ Differences in the qlora scripts:
|
|
12 |
|
13 |
__I think there's a bug in gradient accumulation, so if you try this, maybe set gradient accumulation steps to 1__
|
14 |
|
|
|
|
|
15 |
__5 epochs seemed to achieve the best results, but YMMV__
|
16 |
|
17 |
Full example of tuning (used for airoboros-mpt-30b-gpt4-1.4):
|
|
|
12 |
|
13 |
__I think there's a bug in gradient accumulation, so if you try this, maybe set gradient accumulation steps to 1__
|
14 |
|
15 |
+
*my first attempts used batch size 6, with gradient accumulation steps 16, but results of three epochs with gradient accumulation vs without were quite a bit worse*
|
16 |
+
|
17 |
__5 epochs seemed to achieve the best results, but YMMV__
|
18 |
|
19 |
Full example of tuning (used for airoboros-mpt-30b-gpt4-1.4):
|