bennexx
/

cl-tohoku-bert-base-japanese-v3-jlpt-classifier

Text Classification

Inference Endpoints

Model card Files Files and versions Community

bennexx commited on Jan 29, 2024

Commit

5efa696

·

verified ·

1 Parent(s): c4992d3

Update README.md

Files changed (1) hide show

README.md +38 -1

README.md CHANGED Viewed

@@ -2,4 +2,41 @@
 license: cc
 language:
 - ja
----

 license: cc
 language:
 - ja
+library_name: transformers
+---
+### Summary
+This is a text classifier for assigning a [JLPT level](https://www.jlpt.jp/e/about/levelsummary.html). It was trained at the sentence level.
+A pre-trained [cl-tohoku-bert-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3) is finetuned on ~5000k labeled sentences obtained from language learning websites.
+Performance on same distribution data is good.
+```
+              precision    recall  f1-score   support
+          N5       0.62      0.66      0.64       145
+          N4       0.34      0.36      0.35       143
+          N3       0.33      0.67      0.45       197
+          N2       0.26      0.20      0.23       192
+          N1       0.59      0.08      0.15       202
+    accuracy                           0.38       879
+   macro avg       0.43      0.39      0.36       879
+weighted avg       0.42      0.38      0.34       879
+```
+But on test data consisting of official JLPT material it is not so good.
+```
+              precision    recall  f1-score   support
+          N5       0.88      0.88      0.88        25
+          N4       0.90      0.89      0.90        53
+          N3       0.78      0.90      0.84        62
+          N2       0.71      0.79      0.75        47
+          N1       0.95      0.77      0.85        73
+    accuracy                           0.84       260
+   macro avg       0.84      0.84      0.84       260
+weighted avg       0.85      0.84      0.84       260
+```
+Still, it can give a ballpark estimation of sentence difficulty, altough not very precise.