EGD DistilBERT (Multilingual Cased)
Model Overview
This model is based on DistilBERT-base-multilingual-cased and has been fine-tuned on English, Hungarian, and German data for text classification of European Parliamentary speeches into rhetorical categories.
The model classifies text into three categories:
- 0 - Other (text that does not fit into moralist or realist categories)
- 1 - Moralist (arguments emphasizing moral reasoning)
- 2 - Realist (arguments applying pragmatic or realist reasoning)
This model is useful for analyzing political discourse and rhetorical styles in multiple languages.
Evaluation Results
The model was evaluated on a test set of 938 sentences, with the following results:
Label | Precision | Recall | F1-score | Support |
---|---|---|---|---|
0 - Other | 0.91 | 0.92 | 0.92 | 783 |
1 - Moralist | 0.49 | 0.40 | 0.44 | 65 |
2 - Realist | 0.43 | 0.44 | 0.44 | 90 |
- Overall accuracy: 0.84
- Macro average F1-score: 0.60
- Weighted average F1-score: 0.84
The model reliably distinguishes the general (other) class from moralist and realist arguments, though performance on the minority classes (1 and 2) is lower.
Usage
This model can be used with the Hugging Face Transformers library:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "uvegesistvan/EGD_distilbert-base-multilingual-cased"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Classify an example text
text = "The European Union has a responsibility towards future generations."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# Get predicted class
predicted_class = logits.argmax().item()
print(f"Predicted class: {predicted_class}")
- Downloads last month
- 5
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.