EGD DistilBERT (Multilingual Cased)

Model Overview

This model is based on DistilBERT-base-multilingual-cased and has been fine-tuned on English, Hungarian, and German data for text classification of European Parliamentary speeches into rhetorical categories.

The model classifies text into three categories:

  • 0 - Other (text that does not fit into moralist or realist categories)
  • 1 - Moralist (arguments emphasizing moral reasoning)
  • 2 - Realist (arguments applying pragmatic or realist reasoning)

This model is useful for analyzing political discourse and rhetorical styles in multiple languages.


Evaluation Results

The model was evaluated on a test set of 938 sentences, with the following results:

Label Precision Recall F1-score Support
0 - Other 0.91 0.92 0.92 783
1 - Moralist 0.49 0.40 0.44 65
2 - Realist 0.43 0.44 0.44 90
  • Overall accuracy: 0.84
  • Macro average F1-score: 0.60
  • Weighted average F1-score: 0.84

The model reliably distinguishes the general (other) class from moralist and realist arguments, though performance on the minority classes (1 and 2) is lower.


Usage

This model can be used with the Hugging Face Transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "uvegesistvan/EGD_distilbert-base-multilingual-cased"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Classify an example text
text = "The European Union has a responsibility towards future generations."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits

# Get predicted class
predicted_class = logits.argmax().item()
print(f"Predicted class: {predicted_class}")
Downloads last month
5
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.