Abdu-GH
/

AraRest-Arabic-Restaurant-Reviews-Sentiment-Analysis

@@ -12,59 +12,71 @@ metrics:
 base_model:
 - aubmindlab/bert-base-arabertv02
 pipeline_tag: text-classification
 ---
-# 🍽️ Arabic Restaurant Review Sentiment Analysis 🚀
-## 📌 Overview
-This project fine-tunes a **transformer-based model** to analyze sentiment in **Arabic restaurant reviews**.
-We utilized **Hugging Face’s model training pipeline** and deployed the final model as an **interactive Gradio web app**.
-## 📥 Data Collection
-The dataset used for fine-tuning was sourced from **Hugging Face Datasets**, specifically:
-[📂 Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
-It contains **restaurant reviews in Arabic** labeled with sentiment polarity.
-## 🔄 Data Preparation
-- **Cleaning & Normalization**:
-  - Removed non-Arabic text, special characters, and extra spaces.
-  - Normalized Arabic characters (e.g., إ, أ, آ → ا, ة → ه).
-  - Downsampled positive reviews to balance the dataset.
 - **Tokenization**:
-  - Used **AraBERT tokenizer** for efficient text processing.
 - **Train-Test Split**:
   - **80% Training** | **20% Testing**.
-## 🏋️ Fine-Tuning & Results
-The model was fine-tuned using **Hugging Face Transformers** on a dataset of restaurant reviews.
-### **📊 Evaluation Metrics**
 | Metric       | Score  |
 |-------------|--------|
-| **Train Loss**| 0.470|
-| **Eval Loss** | 0.373 |
-| **Accuracy**  | 86.41% |
-| **Precision** | 87.01% |
-| **Recall**    | 86.49% |
-| **F1-score**  | 86.75% |
-## ⚙️ Training Parameters
-```python
-model_name = "aubmindlab/bert-base-arabertv2"
-model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, classifier_dropout=0.5).to(device)
 training_args = TrainingArguments(
     output_dir="./results",
-    evaluation_strategy="epoch",
-    save_strategy="epoch",
-    per_device_train_batch_size=8,
-    per_device_eval_batch_size=8,
-    num_train_epochs=4,
-    weight_decay=1,
-    learning_rate=1e-5,
-    lr_scheduler_type="cosine",
-    warmup_ratio=0.1,
     fp16=True,
-    report_to="none",
     save_total_limit=2,
     gradient_accumulation_steps=2,
     load_best_model_at_end=True,
@@ -72,4 +84,60 @@ training_args = TrainingArguments(
     metric_for_best_model="eval_loss",
     greater_is_better=False,
 )
-```

 base_model:
 - aubmindlab/bert-base-arabertv02
 pipeline_tag: text-classification
+tags:
+- arabic
+- sentiment-analysis
+- transformers
+- huggingface
+- bert
+- restaurants
+- fine-tuning
+- nlp
 ---
+# **🍽️ Arabic Restaurant Review Sentiment Analysis 🚀**
+## **📌 Overview**
+This **fine-tuned AraBERT model** classifies **Arabic restaurant reviews** as **Positive** or **Negative**.
+It is based on **aubmindlab/bert-base-arabertv2** and fine-tuned using **Hugging Face Transformers**.
+### **🔥 Why This Model?**
+✅ **Trained on Real Restaurant Reviews** from the **Hugging Face Dataset**.
+✅ **Fine-tuned with Full Training** (not LoRA or Adapters).
+✅ **Balanced Dataset** (2418 Positive vs. 2418 Negative Reviews).
+✅ **High Accuracy & Performance** for Sentiment Analysis in Arabic.
+---
+## **📥 Dataset & Preprocessing**
+- **Dataset Source**: [`hadyelsahar/ar_res_reviews`](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
+- **Text Cleaning**:
+  - Removed **non-Arabic text**, special characters, and extra spaces.
+  - Normalized Arabic characters (`إ, أ, آ → ا`, `ة → ه`).
+  - Balanced **Positive & Negative** sentiment distribution.
 - **Tokenization**:
+  - Used **AraBERT tokenizer** (`aubmindlab/bert-base-arabertv2`).
 - **Train-Test Split**:
   - **80% Training** | **20% Testing**.
+---
+## **🏋️ Training & Performance**
+The model was fine-tuned using **Hugging Face Transformers** with the following hyperparameters:
+### **📊 Final Model Results**
 | Metric       | Score  |
 |-------------|--------|
+| **Train Loss** | `0.470` |
+| **Eval Loss**  | `0.373` |
+| **Accuracy**   | `86.41%` |
+| **Precision**  | `87.01%` |
+| **Recall**     | `86.49%` |
+| **F1-score**   | `86.75%` |
+### **⚙️ Training Configuration**
+```python
 training_args = TrainingArguments(
     output_dir="./results",
+    evaluation_strategy="epoch",
+    save_strategy="epoch",
+    per_device_train_batch_size=8,
+    per_device_eval_batch_size=8,
+    num_train_epochs=4,
+    weight_decay=1,
+    learning_rate=1e-5,
+    lr_scheduler_type="cosine",
+    warmup_ratio=0.1,
     fp16=True,
     save_total_limit=2,
     gradient_accumulation_steps=2,
     load_best_model_at_end=True,
     metric_for_best_model="eval_loss",
     greater_is_better=False,
 )
+```
+---
+## **💡 Usage**
+### **1️⃣ Quick Inference using `pipeline()`**
+```python
+from transformers import pipeline
+model_name = "Abduuu/ArabReview-Sentiment"
+sentiment_pipeline = pipeline("text-classification", model=model_name)
+review = "الطعام كان رائعًا والخدمة ممتازة!"
+result = sentiment_pipeline(review)
+print(result)
+```
+✅ **Example Output:**
+```json
+[{"label": "Positive", "score": 0.96}]
+```
+---
+### **2️⃣ Use Model with `AutoModelForSequenceClassification`**
+For **batch processing & lower latency**, use the `AutoModel` API:
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+model_name = "Abduuu/ArabReview-Sentiment"
+# Load Model & Tokenizer
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+# Example Review
+review = "الخدمة كانت بطيئة والطعام غير جيد."
+inputs = tokenizer(review, return_tensors="pt")
+# Perform Inference
+with torch.no_grad():
+    logits = model(**inputs).logits
+    prediction = torch.argmax(logits).item()
+label_map = {0: "Negative", 1: "Positive"}
+print(f"Predicted Sentiment: {label_map[prediction]}")
+```
+---
+## **🔬 Model Performance (Real Examples)**
+| Review | Prediction |
+|--------|------------|
+| "الطعام كان رائعًا والخدمة ممتازة!" | ✅ **Positive** |
+| "التجربة كانت سيئة والطعام كان باردًا" | ❌ **Negative** |
+---