---
license: apache-2.0
datasets:
- hadyelsahar/ar_res_reviews
language:
- ar
metrics:
- accuracy
- precision
- recall
- f1
base_model: aubmindlab/bert-base-arabertv02
pipeline_tag: text-classification
tags:
- text-classification
- sentiment-analysis
- arabic
- restaurant-reviews
model-index:
- name: ArabReview-Sentiment
  results:
  - task:
      type: text-classification
    dataset:
      name: hadyelsahar/ar_res_reviews
      type: sentiment-analysis
    metrics:
    - name: Accuracy
      type: accuracy
      value: 86.41
    - name: Precision
      type: precision
      value: 87.01
    - name: Recall
      type: recall
      value: 86.49
    - name: F1 Score
      type: f1
      value: 86.75
library_name: transformers
---

# 🍽️ Arabic Restaurant Review Sentiment Analysis 🚀
## 📌 Overview  
This project fine-tunes **AraBERT** to analyze sentiment in **Arabic restaurant reviews**.  
We leveraged **Hugging Face’s `transformers` library** for training and deployed the model as an **interactive pipeline**.

## 📥 Dataset  
The dataset used for fine-tuning is from:  
[📂 Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)  
It contains restaurant reviews labeled as **Positive** or **Negative**.

## 🔄 Preprocessing  
- **Cleaning & Normalization**:
  - Removed **non-Arabic** text, special characters, and extra spaces.
  - **Normalized Arabic characters** (e.g., `إ, أ, آ → ا`, `ة → ه`).
- **Tokenization**:
  - Used **AraBERT tokenizer** for efficient processing.
- **Data Balancing**:
  - 2,418 **Positive** | 2,418 **Negative** (Balanced Dataset).
- **Train-Test Split**:
  - **80% Training** | **20% Testing**.

## 🏋️ Fine-Tuning Details  
We fine-tuned **`aubmindlab/bert-base-arabertv2`** using full fine-tuning (not LoRA).

### **📊 Model Performance**
| Metric       | Score  |
|-------------|--------|
| **Train Loss**| `0.470` |
| **Eval Loss** | `0.373` |
| **Accuracy**  | `86.41%` |
| **Precision** | `87.01%` |
| **Recall**    | `86.49%` |
| **F1-score**  | `86.75%` |

---

## ⚙️ Training Parameters  
```python
model_name = "aubmindlab/bert-base-arabertv2"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, classifier_dropout=0.5).to(device)

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",       
    save_strategy="epoch",             
    per_device_train_batch_size=8,  
    per_device_eval_batch_size=8,   
    num_train_epochs=4,  
    weight_decay=1,  
    learning_rate=1e-5,  
    lr_scheduler_type="cosine",  
    warmup_ratio=0.1,  
    fp16=True,
    report_to="none",
    save_total_limit=2,
    gradient_accumulation_steps=2,
    load_best_model_at_end=True,
    max_grad_norm=1.0,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
)