Abdu-GH
/

AraRest-Arabic-Restaurant-Reviews-Sentiment-Analysis

@@ -1,6 +1,3 @@
-# Create a Markdown file with the enhanced model card content
 ---
 license: apache-2.0
 datasets:
@@ -15,75 +12,63 @@ metrics:
 base_model:
 - aubmindlab/bert-base-arabertv02
 pipeline_tag: text-classification
-tags:
-- arabic
-- sentiment-analysis
-- transformers
-- huggingface
-- bert
-- restaurants
-- fine-tuning
-- nlp
 ---
-# **🍽️ Arabic Restaurant Review Sentiment Analysis 🚀**
-## **📌 Overview**
-This **fine-tuned AraBERT model** classifies **Arabic restaurant reviews** as **Positive** or **Negative**.
-It is based on **aubmindlab/bert-base-arabertv2** and fine-tuned using **Hugging Face Transformers**.
-### **🔥 Why This Model?**
-✅ **Trained on Real Restaurant Reviews** from the **Hugging Face Dataset**.
-✅ **Fine-tuned with Full Training** (not LoRA or Adapters).
-✅ **Balanced Dataset** (2418 Positive vs. 2418 Negative Reviews).
-✅ **High Accuracy & Performance** for Sentiment Analysis in Arabic.
----
-## **📥 Dataset & Preprocessing**
-- **Dataset Source**: [`hadyelsahar/ar_res_reviews`](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
-- **Text Cleaning**:
-  - Removed **non-Arabic text**, special characters, and extra spaces.
-  - Normalized Arabic characters (`إ, أ, آ → ا`, `ة → ه`).
-  - Balanced **Positive & Negative** sentiment distribution.
 - **Tokenization**:
-  - Used **AraBERT tokenizer** (`aubmindlab/bert-base-arabertv2`).
 - **Train-Test Split**:
   - **80% Training** | **20% Testing**.
----
-## **🏋️ Training & Performance**
-The model was fine-tuned using **Hugging Face Transformers** with the following hyperparameters:
-### **📊 Final Model Results**
 | Metric       | Score  |
 |-------------|--------|
-| **Train Loss** | `0.470` |
-| **Eval Loss**  | `0.373` |
-| **Accuracy**   | `86.41%` |
-| **Precision**  | `87.01%` |
-| **Recall**     | `86.49%` |
-| **F1-score**   | `86.75%` |
-### **⚙️ Training Configuration**
-```python
 training_args = TrainingArguments(
     output_dir="./results",
-    evaluation_strategy="epoch",
-    save_strategy="epoch",
-    per_device_train_batch_size=8,
-    per_device_eval_batch_size=8,
-    num_train_epochs=4,
-    weight_decay=1,
-    learning_rate=1e-5,
-    lr_scheduler_type="cosine",
-    warmup_ratio=0.1,
     fp16=True,
     save_total_limit=2,
     gradient_accumulation_steps=2,
     load_best_model_at_end=True,
     max_grad_norm=1.0,
     metric_for_best_model="eval_loss",
     greater_is_better=False,
-)

 ---
 license: apache-2.0
 datasets:
 base_model:
 - aubmindlab/bert-base-arabertv02
 pipeline_tag: text-classification
 ---
+# 🍽️ Arabic Restaurant Review Sentiment Analysis 🚀
+## 📌 Overview
+This project fine-tunes a **transformer-based model** to analyze sentiment in **Arabic restaurant reviews**.
+We utilized **Hugging Face’s model training pipeline** and deployed the final model as an **interactive Gradio web app**.
+## 📥 Data Collection
+The dataset used for fine-tuning was sourced from **Hugging Face Datasets**, specifically:
+[📂 Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
+It contains **restaurant reviews in Arabic** labeled with sentiment polarity.
+## 🔄 Data Preparation
+- **Cleaning & Normalization**:
+  - Removed non-Arabic text, special characters, and extra spaces.
+  - Normalized Arabic characters (e.g., إ, أ, آ → ا, ة → ه).
+  - Downsampled positive reviews to balance the dataset.
 - **Tokenization**:
+  - Used **AraBERT tokenizer** for efficient text processing.
 - **Train-Test Split**:
   - **80% Training** | **20% Testing**.
+## 🏋️ Fine-Tuning & Results
+The model was fine-tuned using **Hugging Face Transformers** on a dataset of restaurant reviews.
+### **📊 Evaluation Metrics**
 | Metric       | Score  |
 |-------------|--------|
+| **Train Loss**| 0.470|
+| **Eval Loss** | 0.373 |
+| **Accuracy**  | 86.41% |
+| **Precision** | 87.01% |
+| **Recall**    | 86.49% |
+| **F1-score**  | 86.75% |
+## ⚙️ Training Parameters
+python
+model_name = "aubmindlab/bert-base-arabertv2"
+model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, classifier_dropout=0.5).to(device)
 training_args = TrainingArguments(
     output_dir="./results",
+    evaluation_strategy="epoch",
+    save_strategy="epoch",
+    per_device_train_batch_size=8,
+    per_device_eval_batch_size=8,
+    num_train_epochs=4,
+    weight_decay=1,
+    learning_rate=1e-5,
+    lr_scheduler_type="cosine",
+    warmup_ratio=0.1,
     fp16=True,
+    report_to="none",
     save_total_limit=2,
     gradient_accumulation_steps=2,
     load_best_model_at_end=True,
     max_grad_norm=1.0,
     metric_for_best_model="eval_loss",
     greater_is_better=False,
+)