metadata
license: apache-2.0
datasets:
- hadyelsahar/ar_res_reviews
language:
- ar
metrics:
- accuracy
- precision
- recall
- f1
base_model: aubmindlab/bert-base-arabertv02
pipeline_tag: text-classification
tags:
- text-classification
- sentiment-analysis
- arabic
- restaurant-reviews
model-index:
- name: ArabReview-Sentiment
results:
- task:
type: text-classification
dataset:
name: hadyelsahar/ar_res_reviews
type: sentiment-analysis
metrics:
- name: Accuracy
type: accuracy
value: 86.41
- name: Precision
type: precision
value: 87.01
- name: Recall
type: recall
value: 86.49
- name: F1 Score
type: f1
value: 86.75
library_name: transformers
π½οΈ Arabic Restaurant Review Sentiment Analysis π
π Overview
This project fine-tunes AraBERT to analyze sentiment in Arabic restaurant reviews.
We leveraged Hugging Faceβs transformers
library for training and deployed the model as an interactive pipeline.
π₯ Dataset
The dataset used for fine-tuning is from:
π Arabic Restaurant Reviews Dataset
It contains restaurant reviews labeled as Positive or Negative.
π Preprocessing
- Cleaning & Normalization:
- Removed non-Arabic text, special characters, and extra spaces.
- Normalized Arabic characters (e.g.,
Ψ₯, Ψ£, Ψ’ β Ψ§
,Ψ© β Ω
).
- Tokenization:
- Used AraBERT tokenizer for efficient processing.
- Data Balancing:
- 2,418 Positive | 2,418 Negative (Balanced Dataset).
- Train-Test Split:
- 80% Training | 20% Testing.
ποΈ Fine-Tuning Details
We fine-tuned aubmindlab/bert-base-arabertv2
using full fine-tuning (not LoRA).
π Model Performance
Metric | Score |
---|---|
Train Loss | 0.470 |
Eval Loss | 0.373 |
Accuracy | 86.41% |
Precision | 87.01% |
Recall | 86.49% |
F1-score | 86.75% |
βοΈ Training Parameters
model_name = "aubmindlab/bert-base-arabertv2"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, classifier_dropout=0.5).to(device)
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
save_strategy="epoch",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=4,
weight_decay=1,
learning_rate=1e-5,
lr_scheduler_type="cosine",
warmup_ratio=0.1,
fp16=True,
report_to="none",
save_total_limit=2,
gradient_accumulation_steps=2,
load_best_model_at_end=True,
max_grad_norm=1.0,
metric_for_best_model="eval_loss",
greater_is_better=False,
)