--- license: cc language: - ja library_name: transformers --- ### Summary This is a text classifier for assigning a [JLPT level](https://www.jlpt.jp/e/about/levelsummary.html). It was trained at the sentence level. A pre-trained [cl-tohoku-bert-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3) is finetuned on ~5000k labeled sentences obtained from language learning websites. Performance on same distribution data is good. ``` precision recall f1-score support N5 0.62 0.66 0.64 145 N4 0.34 0.36 0.35 143 N3 0.33 0.67 0.45 197 N2 0.26 0.20 0.23 192 N1 0.59 0.08 0.15 202 accuracy 0.38 879 macro avg 0.43 0.39 0.36 879 weighted avg 0.42 0.38 0.34 879 ``` But on test data consisting of official JLPT material it is not so good. ``` precision recall f1-score support N5 0.88 0.88 0.88 25 N4 0.90 0.89 0.90 53 N3 0.78 0.90 0.84 62 N2 0.71 0.79 0.75 47 N1 0.95 0.77 0.85 73 accuracy 0.84 260 macro avg 0.84 0.84 0.84 260 weighted avg 0.85 0.84 0.84 260 ``` Still, it can give a ballpark estimation of sentence difficulty, altough not very precise.