bennexx
/

cl-tohoku-bert-base-japanese-v3-jlpt-classifier

Text Classification

Inference Endpoints

Model card Files Files and versions Community

cl-tohoku-bert-base-japanese-v3-jlpt-classifier / README.md

bennexx's picture

added citation

86a7703 verified 6 months ago

|

history blame contribute delete

2.29 kB

	---
	license: cc
	language:
	- ja
	library_name: transformers
	---

	### Summary

	This is a text classifier for assigning a [JLPT level](https://www.jlpt.jp/e/about/levelsummary.html). It was trained at the sentence level.
	A pre-trained [cl-tohoku-bert-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3) is finetuned on ~5000 labeled sentences obtained from language learning websites.
	Performance on same distribution data is good.

	```
	precision recall f1-score support
	N5 0.88 0.88 0.88 25
	N4 0.90 0.89 0.90 53
	N3 0.78 0.90 0.84 62
	N2 0.71 0.79 0.75 47
	N1 0.95 0.77 0.85 73
	accuracy 0.84 260
	macro avg 0.84 0.84 0.84 260
	weighted avg 0.85 0.84 0.84 260
	```



	But on test data consisting of official JLPT material it is not so good.
	```
	precision recall f1-score support
	N5 0.62 0.66 0.64 145
	N4 0.34 0.36 0.35 143
	N3 0.33 0.67 0.45 197
	N2 0.26 0.20 0.23 192
	N1 0.59 0.08 0.15 202
	accuracy 0.38 879
	macro avg 0.43 0.39 0.36 879
	weighted avg 0.42 0.38 0.34 879
	```



	Still, it can give a ballpark estimation of sentence difficulty, although not very precise.

	# Cite

	```
	@inproceedings{benedetti-etal-2024-automatically,
	title = "Automatically Suggesting Diverse Example Sentences for {L}2 {J}apanese Learners Using Pre-Trained Language Models",
	author = "Benedetti, Enrico and
	Aizawa, Akiko and
	Boudin, Florian",
	editor = "Fu, Xiyan and
	Fleisig, Eve",
	booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)",
	month = aug,
	year = "2024",
	address = "Bangkok, Thailand",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2024.acl-srw.11",
	pages = "114--131"
	}
	```