metadata
license: apache-2.0
datasets:
- hadyelsahar/ar_res_reviews
language:
- ar
metrics:
- accuracy
- precision
- recall
- f1
base_model:
- aubmindlab/bert-base-arabertv02
pipeline_tag: text-classification
tags:
- arabic
- sentiment-analysis
- transformers
- huggingface
- bert
- restaurants
- fine-tuning
- nlp
🍽️ Arabic Restaurant Review Sentiment Analysis 🚀
📌 Overview
This fine-tuned AraBERT model classifies Arabic restaurant reviews as Positive or Negative.
It is based on aubmindlab/bert-base-arabertv2 and fine-tuned using Hugging Face Transformers.
🔥 Why This Model?
✅ Trained on Real Restaurant Reviews from the Hugging Face Dataset.
✅ Fine-tuned with Full Training (not LoRA or Adapters).
✅ Balanced Dataset (2418 Positive vs. 2418 Negative Reviews).
✅ High Accuracy & Performance for Sentiment Analysis in Arabic.
📥 Dataset & Preprocessing
- Dataset Source:
hadyelsahar/ar_res_reviews
- Text Cleaning:
- Removed non-Arabic text, special characters, and extra spaces.
- Normalized Arabic characters (
إ, أ, آ → ا
,ة → ه
). - Balanced Positive & Negative sentiment distribution.
- Tokenization:
- Used AraBERT tokenizer (
aubmindlab/bert-base-arabertv2
).
- Used AraBERT tokenizer (
- Train-Test Split:
- 80% Training | 20% Testing.
🏋️ Training & Performance
The model was fine-tuned using Hugging Face Transformers with the following hyperparameters:
📊 Final Model Results
Metric | Score |
---|---|
Train Loss | 0.470 |
Eval Loss | 0.373 |
Accuracy | 86.41% |
Precision | 87.01% |
Recall | 86.49% |
F1-score | 86.75% |
⚙️ Training Configuration
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
save_strategy="epoch",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=4,
weight_decay=1,
learning_rate=1e-5,
lr_scheduler_type="cosine",
warmup_ratio=0.1,
fp16=True,
save_total_limit=2,
gradient_accumulation_steps=2,
load_best_model_at_end=True,
max_grad_norm=1.0,
metric_for_best_model="eval_loss",
greater_is_better=False,
)
💡 Usage
1️⃣ Quick Inference using pipeline()
from transformers import pipeline
model_name = "Abduuu/ArabReview-Sentiment"
sentiment_pipeline = pipeline("text-classification", model=model_name)
review = "الطعام كان رائعًا والخدمة ممتازة!"
result = sentiment_pipeline(review)
print(result)
✅ Example Output:
[{"label": "Positive", "score": 0.96}]
2️⃣ Use Model with AutoModelForSequenceClassification
For batch processing & lower latency, use the AutoModel
API:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model_name = "Abduuu/ArabReview-Sentiment"
# Load Model & Tokenizer
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Example Review
review = "الخدمة كانت بطيئة والطعام غير جيد."
inputs = tokenizer(review, return_tensors="pt")
# Perform Inference
with torch.no_grad():
logits = model(**inputs).logits
prediction = torch.argmax(logits).item()
label_map = {0: "Negative", 1: "Positive"}
print(f"Predicted Sentiment: {label_map[prediction]}")
🔬 Model Performance (Real Examples)
Review | Prediction |
---|---|
"الطعام كان رائعًا والخدمة ممتازة!" | ✅ Positive |
"التجربة كانت سيئة والطعام كان باردًا" | ❌ Negative |