Update README.md
Browse files
README.md
CHANGED
@@ -11,20 +11,6 @@ This is a text classifier for assigning a [JLPT level](https://www.jlpt.jp/e/abo
|
|
11 |
A pre-trained [cl-tohoku-bert-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3) is finetuned on ~5000k labeled sentences obtained from language learning websites.
|
12 |
Performance on same distribution data is good.
|
13 |
|
14 |
-
```
|
15 |
-
precision recall f1-score support
|
16 |
-
N5 0.62 0.66 0.64 145
|
17 |
-
N4 0.34 0.36 0.35 143
|
18 |
-
N3 0.33 0.67 0.45 197
|
19 |
-
N2 0.26 0.20 0.23 192
|
20 |
-
N1 0.59 0.08 0.15 202
|
21 |
-
accuracy 0.38 879
|
22 |
-
macro avg 0.43 0.39 0.36 879
|
23 |
-
weighted avg 0.42 0.38 0.34 879
|
24 |
-
```
|
25 |
-
|
26 |
-
But on test data consisting of official JLPT material it is not so good.
|
27 |
-
|
28 |
```
|
29 |
precision recall f1-score support
|
30 |
N5 0.88 0.88 0.88 25
|
@@ -39,4 +25,19 @@ weighted avg 0.85 0.84 0.84 260
|
|
39 |
|
40 |
|
41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
Still, it can give a ballpark estimation of sentence difficulty, altough not very precise.
|
|
|
11 |
A pre-trained [cl-tohoku-bert-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3) is finetuned on ~5000k labeled sentences obtained from language learning websites.
|
12 |
Performance on same distribution data is good.
|
13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
```
|
15 |
precision recall f1-score support
|
16 |
N5 0.88 0.88 0.88 25
|
|
|
25 |
|
26 |
|
27 |
|
28 |
+
But on test data consisting of official JLPT material it is not so good.
|
29 |
+
```
|
30 |
+
precision recall f1-score support
|
31 |
+
N5 0.62 0.66 0.64 145
|
32 |
+
N4 0.34 0.36 0.35 143
|
33 |
+
N3 0.33 0.67 0.45 197
|
34 |
+
N2 0.26 0.20 0.23 192
|
35 |
+
N1 0.59 0.08 0.15 202
|
36 |
+
accuracy 0.38 879
|
37 |
+
macro avg 0.43 0.39 0.36 879
|
38 |
+
weighted avg 0.42 0.38 0.34 879
|
39 |
+
```
|
40 |
+
|
41 |
+
|
42 |
+
|
43 |
Still, it can give a ballpark estimation of sentence difficulty, altough not very precise.
|