bennexx commited on
Commit
5efa696
·
verified ·
1 Parent(s): c4992d3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -1
README.md CHANGED
@@ -2,4 +2,41 @@
2
  license: cc
3
  language:
4
  - ja
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: cc
3
  language:
4
  - ja
5
+ library_name: transformers
6
+ ---
7
+
8
+ ### Summary
9
+
10
+ This is a text classifier for assigning a [JLPT level](https://www.jlpt.jp/e/about/levelsummary.html). It was trained at the sentence level.
11
+ A pre-trained [cl-tohoku-bert-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3) is finetuned on ~5000k labeled sentences obtained from language learning websites.
12
+ Performance on same distribution data is good.
13
+
14
+ ```
15
+ precision recall f1-score support
16
+ N5 0.62 0.66 0.64 145
17
+ N4 0.34 0.36 0.35 143
18
+ N3 0.33 0.67 0.45 197
19
+ N2 0.26 0.20 0.23 192
20
+ N1 0.59 0.08 0.15 202
21
+ accuracy 0.38 879
22
+ macro avg 0.43 0.39 0.36 879
23
+ weighted avg 0.42 0.38 0.34 879
24
+ ```
25
+
26
+ But on test data consisting of official JLPT material it is not so good.
27
+
28
+ ```
29
+ precision recall f1-score support
30
+ N5 0.88 0.88 0.88 25
31
+ N4 0.90 0.89 0.90 53
32
+ N3 0.78 0.90 0.84 62
33
+ N2 0.71 0.79 0.75 47
34
+ N1 0.95 0.77 0.85 73
35
+ accuracy 0.84 260
36
+ macro avg 0.84 0.84 0.84 260
37
+ weighted avg 0.85 0.84 0.84 260
38
+ ```
39
+
40
+
41
+
42
+ Still, it can give a ballpark estimation of sentence difficulty, altough not very precise.