Update README.md
Browse files
README.md
CHANGED
@@ -90,6 +90,8 @@ The `raw` column represents a weighted average of scores of augmented sentences
|
|
90 |
|
91 |
## Training procedure
|
92 |
|
|
|
|
|
93 |
- tokenizer.model_max_length: 8192 (full context length)
|
94 |
- attn_implementation: flash_attention_2
|
95 |
|
|
|
90 |
|
91 |
## Training procedure
|
92 |
|
93 |
+
The model is trained on 1x H200 SXM (143 GB VRAM) for approx. 26 hours.
|
94 |
+
|
95 |
- tokenizer.model_max_length: 8192 (full context length)
|
96 |
- attn_implementation: flash_attention_2
|
97 |
|