Create a Markdown file with the enhanced model card content
model_card_content = """\
license: apache-2.0 datasets: - hadyelsahar/ar_res_reviews language: - ar metrics: - accuracy - precision - recall - f1 base_model: - aubmindlab/bert-base-arabertv02 pipeline_tag: text-classification tags: - arabic - sentiment-analysis - transformers - huggingface - bert - restaurants - fine-tuning - nlp
π½οΈ Arabic Restaurant Review Sentiment Analysis π
π Overview
This fine-tuned AraBERT model classifies Arabic restaurant reviews as Positive or Negative.
It is based on aubmindlab/bert-base-arabertv2 and fine-tuned using Hugging Face Transformers.
π₯ Why This Model?
β
Trained on Real Restaurant Reviews from the Hugging Face Dataset.
β
Fine-tuned with Full Training (not LoRA or Adapters).
β
Balanced Dataset (2418 Positive vs. 2418 Negative Reviews).
β
High Accuracy & Performance for Sentiment Analysis in Arabic.
π₯ Dataset & Preprocessing
- Dataset Source:
hadyelsahar/ar_res_reviews
- Text Cleaning:
- Removed non-Arabic text, special characters, and extra spaces.
- Normalized Arabic characters (
Ψ₯, Ψ£, Ψ’ β Ψ§
,Ψ© β Ω
). - Balanced Positive & Negative sentiment distribution.
- Tokenization:
- Used AraBERT tokenizer (
aubmindlab/bert-base-arabertv2
).
- Used AraBERT tokenizer (
- Train-Test Split:
- 80% Training | 20% Testing.
ποΈ Training & Performance
The model was fine-tuned using Hugging Face Transformers with the following hyperparameters:
π Final Model Results
Metric | Score |
---|---|
Train Loss | 0.470 |
Eval Loss | 0.373 |
Accuracy | 86.41% |
Precision | 87.01% |
Recall | 86.49% |
F1-score | 86.75% |
βοΈ Training Configuration
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
save_strategy="epoch",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=4,
weight_decay=1,
learning_rate=1e-5,
lr_scheduler_type="cosine",
warmup_ratio=0.1,
fp16=True,
save_total_limit=2,
gradient_accumulation_steps=2,
load_best_model_at_end=True,
max_grad_norm=1.0,
metric_for_best_model="eval_loss",
greater_is_better=False,
)