---
license: apache-2.0
datasets:
- hadyelsahar/ar_res_reviews
language:
- ar
metrics:
- accuracy
- precision
- recall
- f1
base_model:
- aubmindlab/bert-base-arabertv02
pipeline_tag: text-classification
---

# 🍽️ Arabic Restaurant Review Sentiment Analysis 🚀

## 📌 Overview  
This project fine-tunes a **transformer-based model** to analyze sentiment in **Arabic restaurant reviews**.  
We utilized **Hugging Face’s model training pipeline** and deployed the final model as an **interactive Gradio web app**.

## 📥 Data Collection  
The dataset used for fine-tuning was sourced from **Hugging Face Datasets**, specifically:  
[📂 Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)  
It contains **restaurant reviews in Arabic** labeled with sentiment polarity.

## 🔄 Data Preparation  
- **Cleaning & Normalization**:
  - Removed non-Arabic text, special characters, and extra spaces.
  - Normalized Arabic characters (e.g., `إ, أ, آ → ا`, `ة → ه`).
  - Downsampled positive reviews to balance the dataset.
- **Tokenization**:
  - Used **AraBERT tokenizer** for efficient text processing.
- **Train-Test Split**:
  - **80% Training** | **20% Testing**.

## 🏋️ Fine-Tuning & Results  
The model was fine-tuned using **Hugging Face Transformers** on a dataset of restaurant reviews.

### **📊 Evaluation Metrics**
| Metric       | Score  |
|-------------|--------|
| **Eval Loss** | `0.5665` |
| **Accuracy**  | `70.37%` |
| **Precision** | `70.36%` |
| **Recall**    | `70.37%` |
| **F1-score**  | `69.75%` |
| **Eval Runtime** | `11.5 sec` |

## ⚙️ Training Parameters  
```python
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="steps",
    eval_steps=200,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    num_train_epochs=5,
    weight_decay=0.01,
    learning_rate=3e-5,
    logging_steps=100,
    fp16=True,
    report_to="none"
)