Add new SentenceTransformer model.
Browse files- README.md +134 -129
- model.safetensors +1 -1
README.md
CHANGED
@@ -45,34 +45,34 @@ tags:
|
|
45 |
- sentence-similarity
|
46 |
- feature-extraction
|
47 |
- generated_from_trainer
|
48 |
-
- dataset_size:
|
49 |
- loss:MultipleNegativesRankingLoss
|
50 |
widget:
|
51 |
-
- source_sentence:
|
52 |
sentences:
|
53 |
-
-
|
54 |
-
-
|
55 |
-
-
|
56 |
-
- source_sentence:
|
57 |
sentences:
|
58 |
-
-
|
59 |
-
- 村人について教えて
|
60 |
-
- 昨日なに作ったの?
|
61 |
-
- source_sentence: じぶん
|
62 |
-
sentences:
|
63 |
-
- 窓が開いていたから
|
64 |
- 自分がやった
|
65 |
-
-
|
66 |
-
- source_sentence:
|
|
|
|
|
|
|
|
|
|
|
67 |
sentences:
|
68 |
-
-
|
69 |
-
-
|
70 |
-
-
|
71 |
-
- source_sentence:
|
72 |
sentences:
|
73 |
-
-
|
74 |
-
-
|
75 |
-
-
|
76 |
model-index:
|
77 |
- name: SentenceTransformer based on colorfulscoop/sbert-base-ja
|
78 |
results:
|
@@ -84,109 +84,109 @@ model-index:
|
|
84 |
type: custom-arc-semantics-data
|
85 |
metrics:
|
86 |
- type: cosine_accuracy
|
87 |
-
value: 0.
|
88 |
name: Cosine Accuracy
|
89 |
- type: cosine_accuracy_threshold
|
90 |
-
value: 0.
|
91 |
name: Cosine Accuracy Threshold
|
92 |
- type: cosine_f1
|
93 |
-
value: 0.
|
94 |
name: Cosine F1
|
95 |
- type: cosine_f1_threshold
|
96 |
-
value: 0.
|
97 |
name: Cosine F1 Threshold
|
98 |
- type: cosine_precision
|
99 |
-
value:
|
100 |
name: Cosine Precision
|
101 |
- type: cosine_recall
|
102 |
-
value: 0.
|
103 |
name: Cosine Recall
|
104 |
- type: cosine_ap
|
105 |
-
value:
|
106 |
name: Cosine Ap
|
107 |
- type: dot_accuracy
|
108 |
-
value: 0.
|
109 |
name: Dot Accuracy
|
110 |
- type: dot_accuracy_threshold
|
111 |
-
value:
|
112 |
name: Dot Accuracy Threshold
|
113 |
- type: dot_f1
|
114 |
-
value: 0.
|
115 |
name: Dot F1
|
116 |
- type: dot_f1_threshold
|
117 |
-
value:
|
118 |
name: Dot F1 Threshold
|
119 |
- type: dot_precision
|
120 |
-
value:
|
121 |
name: Dot Precision
|
122 |
- type: dot_recall
|
123 |
-
value: 0.
|
124 |
name: Dot Recall
|
125 |
- type: dot_ap
|
126 |
-
value:
|
127 |
name: Dot Ap
|
128 |
- type: manhattan_accuracy
|
129 |
-
value: 0.
|
130 |
name: Manhattan Accuracy
|
131 |
- type: manhattan_accuracy_threshold
|
132 |
-
value:
|
133 |
name: Manhattan Accuracy Threshold
|
134 |
- type: manhattan_f1
|
135 |
-
value: 0.
|
136 |
name: Manhattan F1
|
137 |
- type: manhattan_f1_threshold
|
138 |
-
value:
|
139 |
name: Manhattan F1 Threshold
|
140 |
- type: manhattan_precision
|
141 |
-
value:
|
142 |
name: Manhattan Precision
|
143 |
- type: manhattan_recall
|
144 |
-
value: 0.
|
145 |
name: Manhattan Recall
|
146 |
- type: manhattan_ap
|
147 |
-
value:
|
148 |
name: Manhattan Ap
|
149 |
- type: euclidean_accuracy
|
150 |
-
value: 0.
|
151 |
name: Euclidean Accuracy
|
152 |
- type: euclidean_accuracy_threshold
|
153 |
-
value:
|
154 |
name: Euclidean Accuracy Threshold
|
155 |
- type: euclidean_f1
|
156 |
-
value: 0.
|
157 |
name: Euclidean F1
|
158 |
- type: euclidean_f1_threshold
|
159 |
-
value:
|
160 |
name: Euclidean F1 Threshold
|
161 |
- type: euclidean_precision
|
162 |
-
value:
|
163 |
name: Euclidean Precision
|
164 |
- type: euclidean_recall
|
165 |
-
value: 0.
|
166 |
name: Euclidean Recall
|
167 |
- type: euclidean_ap
|
168 |
-
value:
|
169 |
name: Euclidean Ap
|
170 |
- type: max_accuracy
|
171 |
-
value: 0.
|
172 |
name: Max Accuracy
|
173 |
- type: max_accuracy_threshold
|
174 |
-
value:
|
175 |
name: Max Accuracy Threshold
|
176 |
- type: max_f1
|
177 |
-
value: 0.
|
178 |
name: Max F1
|
179 |
- type: max_f1_threshold
|
180 |
-
value:
|
181 |
name: Max F1 Threshold
|
182 |
- type: max_precision
|
183 |
-
value:
|
184 |
name: Max Precision
|
185 |
- type: max_recall
|
186 |
-
value: 0.
|
187 |
name: Max Recall
|
188 |
- type: max_ap
|
189 |
-
value:
|
190 |
name: Max Ap
|
191 |
---
|
192 |
|
@@ -239,9 +239,9 @@ from sentence_transformers import SentenceTransformer
|
|
239 |
model = SentenceTransformer("LeoChiuu/sbert-base-ja-arc-temp")
|
240 |
# Run inference
|
241 |
sentences = [
|
242 |
-
'
|
243 |
-
'
|
244 |
-
'
|
245 |
]
|
246 |
embeddings = model.encode(sentences)
|
247 |
print(embeddings.shape)
|
@@ -285,43 +285,43 @@ You can finetune this model on your own dataset.
|
|
285 |
* Dataset: `custom-arc-semantics-data`
|
286 |
* Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
|
287 |
|
288 |
-
| Metric | Value
|
289 |
-
|
290 |
-
| cosine_accuracy | 0.
|
291 |
-
| cosine_accuracy_threshold | 0.
|
292 |
-
| cosine_f1 | 0.
|
293 |
-
| cosine_f1_threshold | 0.
|
294 |
-
| cosine_precision |
|
295 |
-
| cosine_recall | 0.
|
296 |
-
| cosine_ap |
|
297 |
-
| dot_accuracy | 0.
|
298 |
-
| dot_accuracy_threshold |
|
299 |
-
| dot_f1 | 0.
|
300 |
-
| dot_f1_threshold |
|
301 |
-
| dot_precision |
|
302 |
-
| dot_recall | 0.
|
303 |
-
| dot_ap |
|
304 |
-
| manhattan_accuracy | 0.
|
305 |
-
| manhattan_accuracy_threshold |
|
306 |
-
| manhattan_f1 | 0.
|
307 |
-
| manhattan_f1_threshold |
|
308 |
-
| manhattan_precision |
|
309 |
-
| manhattan_recall | 0.
|
310 |
-
| manhattan_ap |
|
311 |
-
| euclidean_accuracy | 0.
|
312 |
-
| euclidean_accuracy_threshold |
|
313 |
-
| euclidean_f1 | 0.
|
314 |
-
| euclidean_f1_threshold |
|
315 |
-
| euclidean_precision |
|
316 |
-
| euclidean_recall | 0.
|
317 |
-
| euclidean_ap |
|
318 |
-
| max_accuracy | 0.
|
319 |
-
| max_accuracy_threshold |
|
320 |
-
| max_f1 | 0.
|
321 |
-
| max_f1_threshold |
|
322 |
-
| max_precision |
|
323 |
-
| max_recall | 0.
|
324 |
-
| **max_ap** | **
|
325 |
|
326 |
<!--
|
327 |
## Bias, Risks and Limitations
|
@@ -342,19 +342,19 @@ You can finetune this model on your own dataset.
|
|
342 |
#### Unnamed Dataset
|
343 |
|
344 |
|
345 |
-
* Size:
|
346 |
* Columns: <code>text1</code>, <code>text2</code>, and <code>label</code>
|
347 |
* Approximate statistics based on the first 1000 samples:
|
348 |
-
| | text1 | text2 | label
|
349 |
-
|
350 |
-
| type | string | string | int
|
351 |
-
| details | <ul><li>min: 4 tokens</li><li>mean: 8.
|
352 |
* Samples:
|
353 |
-
| text1
|
354 |
-
|
355 |
-
| <code
|
356 |
-
| <code
|
357 |
-
| <code
|
358 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
359 |
```json
|
360 |
{
|
@@ -368,19 +368,19 @@ You can finetune this model on your own dataset.
|
|
368 |
#### Unnamed Dataset
|
369 |
|
370 |
|
371 |
-
* Size:
|
372 |
* Columns: <code>text1</code>, <code>text2</code>, and <code>label</code>
|
373 |
* Approximate statistics based on the first 1000 samples:
|
374 |
-
| | text1
|
375 |
-
|
376 |
-
| type | string
|
377 |
-
| details | <ul><li>min: 4 tokens</li><li>mean: 8.
|
378 |
* Samples:
|
379 |
-
| text1
|
380 |
-
|
381 |
-
| <code
|
382 |
-
| <code
|
383 |
-
| <code
|
384 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
385 |
```json
|
386 |
{
|
@@ -393,8 +393,8 @@ You can finetune this model on your own dataset.
|
|
393 |
#### Non-Default Hyperparameters
|
394 |
|
395 |
- `eval_strategy`: epoch
|
396 |
-
- `learning_rate`:
|
397 |
-
- `num_train_epochs`:
|
398 |
- `warmup_ratio`: 0.1
|
399 |
- `fp16`: True
|
400 |
- `batch_sampler`: no_duplicates
|
@@ -413,13 +413,13 @@ You can finetune this model on your own dataset.
|
|
413 |
- `gradient_accumulation_steps`: 1
|
414 |
- `eval_accumulation_steps`: None
|
415 |
- `torch_empty_cache_steps`: None
|
416 |
-
- `learning_rate`:
|
417 |
- `weight_decay`: 0.0
|
418 |
- `adam_beta1`: 0.9
|
419 |
- `adam_beta2`: 0.999
|
420 |
- `adam_epsilon`: 1e-08
|
421 |
- `max_grad_norm`: 1.0
|
422 |
-
- `num_train_epochs`:
|
423 |
- `max_steps`: -1
|
424 |
- `lr_scheduler_type`: linear
|
425 |
- `lr_scheduler_kwargs`: {}
|
@@ -519,15 +519,20 @@ You can finetune this model on your own dataset.
|
|
519 |
### Training Logs
|
520 |
| Epoch | Step | Training Loss | loss | custom-arc-semantics-data_max_ap |
|
521 |
|:-----:|:----:|:-------------:|:------:|:--------------------------------:|
|
522 |
-
| None | 0 | - | - |
|
523 |
-
| 1.0 |
|
524 |
-
| 2.0 |
|
525 |
-
| 3.0 |
|
526 |
-
| 4.0 |
|
527 |
-
| 5.0 |
|
528 |
-
| 6.0 |
|
529 |
-
| 7.0 |
|
530 |
-
| 8.0 |
|
|
|
|
|
|
|
|
|
|
|
531 |
|
532 |
|
533 |
### Framework Versions
|
|
|
45 |
- sentence-similarity
|
46 |
- feature-extraction
|
47 |
- generated_from_trainer
|
48 |
+
- dataset_size:267
|
49 |
- loss:MultipleNegativesRankingLoss
|
50 |
widget:
|
51 |
+
- source_sentence: 昨日夕飯にチキンヌードル食べた?
|
52 |
sentences:
|
53 |
+
- ナイトスタンドにスカーフはある?
|
54 |
+
- 夕飯はチキンヌードルだった?
|
55 |
+
- スカーフがキャンプファイヤーで燃えてる
|
56 |
+
- source_sentence: テーブル
|
57 |
sentences:
|
58 |
+
- はじめにどこをさがせばいい?
|
|
|
|
|
|
|
|
|
|
|
59 |
- 自分がやった
|
60 |
+
- テーブルを調べよう
|
61 |
+
- source_sentence: 欲しくない
|
62 |
+
sentences:
|
63 |
+
- 物の姿を変える魔法が使える村人を知っている?
|
64 |
+
- 誰かが魔法を使った
|
65 |
+
- 家の中を探してみよう
|
66 |
+
- source_sentence: 家の外
|
67 |
sentences:
|
68 |
+
- キャンドル要らない
|
69 |
+
- どこでもいいよ
|
70 |
+
- 魔法使い
|
71 |
+
- source_sentence: キャンドル頂戴
|
72 |
sentences:
|
73 |
+
- 物の姿を変える魔法が使える村人を知っている?
|
74 |
+
- 魔女
|
75 |
+
- やっぱり、キャンドルがいい
|
76 |
model-index:
|
77 |
- name: SentenceTransformer based on colorfulscoop/sbert-base-ja
|
78 |
results:
|
|
|
84 |
type: custom-arc-semantics-data
|
85 |
metrics:
|
86 |
- type: cosine_accuracy
|
87 |
+
value: 0.8258426966292135
|
88 |
name: Cosine Accuracy
|
89 |
- type: cosine_accuracy_threshold
|
90 |
+
value: 0.530483067035675
|
91 |
name: Cosine Accuracy Threshold
|
92 |
- type: cosine_f1
|
93 |
+
value: 0.8571428571428571
|
94 |
name: Cosine F1
|
95 |
- type: cosine_f1_threshold
|
96 |
+
value: 0.530483067035675
|
97 |
name: Cosine F1 Threshold
|
98 |
- type: cosine_precision
|
99 |
+
value: 0.8532110091743119
|
100 |
name: Cosine Precision
|
101 |
- type: cosine_recall
|
102 |
+
value: 0.8611111111111112
|
103 |
name: Cosine Recall
|
104 |
- type: cosine_ap
|
105 |
+
value: 0.9302395955607082
|
106 |
name: Cosine Ap
|
107 |
- type: dot_accuracy
|
108 |
+
value: 0.8202247191011236
|
109 |
name: Dot Accuracy
|
110 |
- type: dot_accuracy_threshold
|
111 |
+
value: 286.6033630371094
|
112 |
name: Dot Accuracy Threshold
|
113 |
- type: dot_f1
|
114 |
+
value: 0.8518518518518519
|
115 |
name: Dot F1
|
116 |
- type: dot_f1_threshold
|
117 |
+
value: 286.6033630371094
|
118 |
name: Dot F1 Threshold
|
119 |
- type: dot_precision
|
120 |
+
value: 0.8518518518518519
|
121 |
name: Dot Precision
|
122 |
- type: dot_recall
|
123 |
+
value: 0.8518518518518519
|
124 |
name: Dot Recall
|
125 |
- type: dot_ap
|
126 |
+
value: 0.9269146593596983
|
127 |
name: Dot Ap
|
128 |
- type: manhattan_accuracy
|
129 |
+
value: 0.8258426966292135
|
130 |
name: Manhattan Accuracy
|
131 |
- type: manhattan_accuracy_threshold
|
132 |
+
value: 500.2329406738281
|
133 |
name: Manhattan Accuracy Threshold
|
134 |
- type: manhattan_f1
|
135 |
+
value: 0.8597285067873304
|
136 |
name: Manhattan F1
|
137 |
- type: manhattan_f1_threshold
|
138 |
+
value: 500.2329406738281
|
139 |
name: Manhattan F1 Threshold
|
140 |
- type: manhattan_precision
|
141 |
+
value: 0.8407079646017699
|
142 |
name: Manhattan Precision
|
143 |
- type: manhattan_recall
|
144 |
+
value: 0.8796296296296297
|
145 |
name: Manhattan Recall
|
146 |
- type: manhattan_ap
|
147 |
+
value: 0.9284651287730749
|
148 |
name: Manhattan Ap
|
149 |
- type: euclidean_accuracy
|
150 |
+
value: 0.8202247191011236
|
151 |
name: Euclidean Accuracy
|
152 |
- type: euclidean_accuracy_threshold
|
153 |
+
value: 21.535140991210938
|
154 |
name: Euclidean Accuracy Threshold
|
155 |
- type: euclidean_f1
|
156 |
+
value: 0.8571428571428572
|
157 |
name: Euclidean F1
|
158 |
- type: euclidean_f1_threshold
|
159 |
+
value: 23.045635223388672
|
160 |
name: Euclidean F1 Threshold
|
161 |
- type: euclidean_precision
|
162 |
+
value: 0.8275862068965517
|
163 |
name: Euclidean Precision
|
164 |
- type: euclidean_recall
|
165 |
+
value: 0.8888888888888888
|
166 |
name: Euclidean Recall
|
167 |
- type: euclidean_ap
|
168 |
+
value: 0.9285413234296498
|
169 |
name: Euclidean Ap
|
170 |
- type: max_accuracy
|
171 |
+
value: 0.8258426966292135
|
172 |
name: Max Accuracy
|
173 |
- type: max_accuracy_threshold
|
174 |
+
value: 500.2329406738281
|
175 |
name: Max Accuracy Threshold
|
176 |
- type: max_f1
|
177 |
+
value: 0.8597285067873304
|
178 |
name: Max F1
|
179 |
- type: max_f1_threshold
|
180 |
+
value: 500.2329406738281
|
181 |
name: Max F1 Threshold
|
182 |
- type: max_precision
|
183 |
+
value: 0.8532110091743119
|
184 |
name: Max Precision
|
185 |
- type: max_recall
|
186 |
+
value: 0.8888888888888888
|
187 |
name: Max Recall
|
188 |
- type: max_ap
|
189 |
+
value: 0.9302395955607082
|
190 |
name: Max Ap
|
191 |
---
|
192 |
|
|
|
239 |
model = SentenceTransformer("LeoChiuu/sbert-base-ja-arc-temp")
|
240 |
# Run inference
|
241 |
sentences = [
|
242 |
+
'キャンドル頂戴',
|
243 |
+
'やっぱり、キャンドルがいい',
|
244 |
+
'物の姿を変える魔法が使える村人を知っている?',
|
245 |
]
|
246 |
embeddings = model.encode(sentences)
|
247 |
print(embeddings.shape)
|
|
|
285 |
* Dataset: `custom-arc-semantics-data`
|
286 |
* Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
|
287 |
|
288 |
+
| Metric | Value |
|
289 |
+
|:-----------------------------|:-----------|
|
290 |
+
| cosine_accuracy | 0.8258 |
|
291 |
+
| cosine_accuracy_threshold | 0.5305 |
|
292 |
+
| cosine_f1 | 0.8571 |
|
293 |
+
| cosine_f1_threshold | 0.5305 |
|
294 |
+
| cosine_precision | 0.8532 |
|
295 |
+
| cosine_recall | 0.8611 |
|
296 |
+
| cosine_ap | 0.9302 |
|
297 |
+
| dot_accuracy | 0.8202 |
|
298 |
+
| dot_accuracy_threshold | 286.6034 |
|
299 |
+
| dot_f1 | 0.8519 |
|
300 |
+
| dot_f1_threshold | 286.6034 |
|
301 |
+
| dot_precision | 0.8519 |
|
302 |
+
| dot_recall | 0.8519 |
|
303 |
+
| dot_ap | 0.9269 |
|
304 |
+
| manhattan_accuracy | 0.8258 |
|
305 |
+
| manhattan_accuracy_threshold | 500.2329 |
|
306 |
+
| manhattan_f1 | 0.8597 |
|
307 |
+
| manhattan_f1_threshold | 500.2329 |
|
308 |
+
| manhattan_precision | 0.8407 |
|
309 |
+
| manhattan_recall | 0.8796 |
|
310 |
+
| manhattan_ap | 0.9285 |
|
311 |
+
| euclidean_accuracy | 0.8202 |
|
312 |
+
| euclidean_accuracy_threshold | 21.5351 |
|
313 |
+
| euclidean_f1 | 0.8571 |
|
314 |
+
| euclidean_f1_threshold | 23.0456 |
|
315 |
+
| euclidean_precision | 0.8276 |
|
316 |
+
| euclidean_recall | 0.8889 |
|
317 |
+
| euclidean_ap | 0.9285 |
|
318 |
+
| max_accuracy | 0.8258 |
|
319 |
+
| max_accuracy_threshold | 500.2329 |
|
320 |
+
| max_f1 | 0.8597 |
|
321 |
+
| max_f1_threshold | 500.2329 |
|
322 |
+
| max_precision | 0.8532 |
|
323 |
+
| max_recall | 0.8889 |
|
324 |
+
| **max_ap** | **0.9302** |
|
325 |
|
326 |
<!--
|
327 |
## Bias, Risks and Limitations
|
|
|
342 |
#### Unnamed Dataset
|
343 |
|
344 |
|
345 |
+
* Size: 267 training samples
|
346 |
* Columns: <code>text1</code>, <code>text2</code>, and <code>label</code>
|
347 |
* Approximate statistics based on the first 1000 samples:
|
348 |
+
| | text1 | text2 | label |
|
349 |
+
|:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:------------------------------------------------|
|
350 |
+
| type | string | string | int |
|
351 |
+
| details | <ul><li>min: 4 tokens</li><li>mean: 8.36 tokens</li><li>max: 15 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 8.35 tokens</li><li>max: 14 tokens</li></ul> | <ul><li>0: ~33.33%</li><li>1: ~66.67%</li></ul> |
|
352 |
* Samples:
|
353 |
+
| text1 | text2 | label |
|
354 |
+
|:-----------------------------|:-----------------------------|:---------------|
|
355 |
+
| <code>ジャックはどんな魔法を使うの?</code> | <code>見た目を変える魔法</code> | <code>0</code> |
|
356 |
+
| <code>魔法使い</code> | <code>魔法をかけられる人</code> | <code>1</code> |
|
357 |
+
| <code>ぬいぐるみが花</code> | <code>花がぬいぐるみに変えられている</code> | <code>1</code> |
|
358 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
359 |
```json
|
360 |
{
|
|
|
368 |
#### Unnamed Dataset
|
369 |
|
370 |
|
371 |
+
* Size: 178 evaluation samples
|
372 |
* Columns: <code>text1</code>, <code>text2</code>, and <code>label</code>
|
373 |
* Approximate statistics based on the first 1000 samples:
|
374 |
+
| | text1 | text2 | label |
|
375 |
+
|:--------|:--------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:------------------------------------------------|
|
376 |
+
| type | string | string | int |
|
377 |
+
| details | <ul><li>min: 4 tokens</li><li>mean: 8.2 tokens</li><li>max: 15 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 8.17 tokens</li><li>max: 14 tokens</li></ul> | <ul><li>0: ~39.33%</li><li>1: ~60.67%</li></ul> |
|
378 |
* Samples:
|
379 |
+
| text1 | text2 | label |
|
380 |
+
|:-----------------------------|:--------------------------------|:---------------|
|
381 |
+
| <code>巻き割をした?</code> | <code>家の中を調べよう</code> | <code>0</code> |
|
382 |
+
| <code>花がぬいぐるみに変えられている</code> | <code>だれかが魔法で花をぬいぐるみに変えた</code> | <code>1</code> |
|
383 |
+
| <code>カミーユ</code> | <code>試すため</code> | <code>0</code> |
|
384 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
385 |
```json
|
386 |
{
|
|
|
393 |
#### Non-Default Hyperparameters
|
394 |
|
395 |
- `eval_strategy`: epoch
|
396 |
+
- `learning_rate`: 2e-05
|
397 |
+
- `num_train_epochs`: 13
|
398 |
- `warmup_ratio`: 0.1
|
399 |
- `fp16`: True
|
400 |
- `batch_sampler`: no_duplicates
|
|
|
413 |
- `gradient_accumulation_steps`: 1
|
414 |
- `eval_accumulation_steps`: None
|
415 |
- `torch_empty_cache_steps`: None
|
416 |
+
- `learning_rate`: 2e-05
|
417 |
- `weight_decay`: 0.0
|
418 |
- `adam_beta1`: 0.9
|
419 |
- `adam_beta2`: 0.999
|
420 |
- `adam_epsilon`: 1e-08
|
421 |
- `max_grad_norm`: 1.0
|
422 |
+
- `num_train_epochs`: 13
|
423 |
- `max_steps`: -1
|
424 |
- `lr_scheduler_type`: linear
|
425 |
- `lr_scheduler_kwargs`: {}
|
|
|
519 |
### Training Logs
|
520 |
| Epoch | Step | Training Loss | loss | custom-arc-semantics-data_max_ap |
|
521 |
|:-----:|:----:|:-------------:|:------:|:--------------------------------:|
|
522 |
+
| None | 0 | - | - | 0.9463 |
|
523 |
+
| 1.0 | 34 | 1.4241 | 1.3327 | 0.9563 |
|
524 |
+
| 2.0 | 68 | 0.8143 | 1.1203 | 0.9564 |
|
525 |
+
| 3.0 | 102 | 0.4052 | 1.0773 | 0.9507 |
|
526 |
+
| 4.0 | 136 | 0.2227 | 1.0795 | 0.9459 |
|
527 |
+
| 5.0 | 170 | 0.1109 | 1.1310 | 0.9377 |
|
528 |
+
| 6.0 | 204 | 0.079 | 1.1382 | 0.9410 |
|
529 |
+
| 7.0 | 238 | 0.0513 | 1.1439 | 0.9369 |
|
530 |
+
| 8.0 | 272 | 0.0369 | 1.1683 | 0.9369 |
|
531 |
+
| 9.0 | 306 | 0.0277 | 1.1558 | 0.9339 |
|
532 |
+
| 10.0 | 340 | 0.0215 | 1.1511 | 0.9338 |
|
533 |
+
| 11.0 | 374 | 0.0156 | 1.1560 | 0.9310 |
|
534 |
+
| 12.0 | 408 | 0.0191 | 1.1661 | 0.9307 |
|
535 |
+
| 13.0 | 442 | 0.0113 | 1.1681 | 0.9302 |
|
536 |
|
537 |
|
538 |
### Framework Versions
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 442491744
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:af049657c02958ab57b8c8cd2b82d3b0165733d92e6db76037000fa3437cfa7d
|
3 |
size 442491744
|