Update README.md
Browse files
README.md
CHANGED
@@ -2,4 +2,41 @@
|
|
2 |
license: cc
|
3 |
language:
|
4 |
- ja
|
5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: cc
|
3 |
language:
|
4 |
- ja
|
5 |
+
library_name: transformers
|
6 |
+
---
|
7 |
+
|
8 |
+
### Summary
|
9 |
+
|
10 |
+
This is a text classifier for assigning a [JLPT level](https://www.jlpt.jp/e/about/levelsummary.html). It was trained at the sentence level.
|
11 |
+
A pre-trained [cl-tohoku-bert-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3) is finetuned on ~5000k labeled sentences obtained from language learning websites.
|
12 |
+
Performance on same distribution data is good.
|
13 |
+
|
14 |
+
```
|
15 |
+
precision recall f1-score support
|
16 |
+
N5 0.62 0.66 0.64 145
|
17 |
+
N4 0.34 0.36 0.35 143
|
18 |
+
N3 0.33 0.67 0.45 197
|
19 |
+
N2 0.26 0.20 0.23 192
|
20 |
+
N1 0.59 0.08 0.15 202
|
21 |
+
accuracy 0.38 879
|
22 |
+
macro avg 0.43 0.39 0.36 879
|
23 |
+
weighted avg 0.42 0.38 0.34 879
|
24 |
+
```
|
25 |
+
|
26 |
+
But on test data consisting of official JLPT material it is not so good.
|
27 |
+
|
28 |
+
```
|
29 |
+
precision recall f1-score support
|
30 |
+
N5 0.88 0.88 0.88 25
|
31 |
+
N4 0.90 0.89 0.90 53
|
32 |
+
N3 0.78 0.90 0.84 62
|
33 |
+
N2 0.71 0.79 0.75 47
|
34 |
+
N1 0.95 0.77 0.85 73
|
35 |
+
accuracy 0.84 260
|
36 |
+
macro avg 0.84 0.84 0.84 260
|
37 |
+
weighted avg 0.85 0.84 0.84 260
|
38 |
+
```
|
39 |
+
|
40 |
+
|
41 |
+
|
42 |
+
Still, it can give a ballpark estimation of sentence difficulty, altough not very precise.
|