jondurbin commited on
Commit
5e4f4bc
·
1 Parent(s): 6d5b10d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -12,6 +12,8 @@ Differences in the qlora scripts:
12
 
13
  __I think there's a bug in gradient accumulation, so if you try this, maybe set gradient accumulation steps to 1__
14
 
 
 
15
  __5 epochs seemed to achieve the best results, but YMMV__
16
 
17
  Full example of tuning (used for airoboros-mpt-30b-gpt4-1.4):
 
12
 
13
  __I think there's a bug in gradient accumulation, so if you try this, maybe set gradient accumulation steps to 1__
14
 
15
+ *my first attempts used batch size 6, with gradient accumulation steps 16, but results of three epochs with gradient accumulation vs without were quite a bit worse*
16
+
17
  __5 epochs seemed to achieve the best results, but YMMV__
18
 
19
  Full example of tuning (used for airoboros-mpt-30b-gpt4-1.4):