Abdulrahman Al-Ghamdi commited on
Commit
45ea391
·
verified ·
1 Parent(s): bac9cde

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -41
README.md CHANGED
@@ -12,59 +12,71 @@ metrics:
12
  base_model:
13
  - aubmindlab/bert-base-arabertv02
14
  pipeline_tag: text-classification
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
- # 🍽️ Arabic Restaurant Review Sentiment Analysis 🚀
18
- ## 📌 Overview
19
- This project fine-tunes a **transformer-based model** to analyze sentiment in **Arabic restaurant reviews**.
20
- We utilized **Hugging Face’s model training pipeline** and deployed the final model as an **interactive Gradio web app**.
21
-
22
- ## 📥 Data Collection
23
- The dataset used for fine-tuning was sourced from **Hugging Face Datasets**, specifically:
24
- [📂 Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
25
- It contains **restaurant reviews in Arabic** labeled with sentiment polarity.
26
-
27
- ## 🔄 Data Preparation
28
- - **Cleaning & Normalization**:
29
- - Removed non-Arabic text, special characters, and extra spaces.
30
- - Normalized Arabic characters (e.g., إ, أ, آ → ا, ة → ه).
31
- - Downsampled positive reviews to balance the dataset.
 
 
 
 
 
32
  - **Tokenization**:
33
- - Used **AraBERT tokenizer** for efficient text processing.
34
  - **Train-Test Split**:
35
  - **80% Training** | **20% Testing**.
36
 
37
- ## 🏋️ Fine-Tuning & Results
38
- The model was fine-tuned using **Hugging Face Transformers** on a dataset of restaurant reviews.
 
 
39
 
40
- ### **📊 Evaluation Metrics**
41
  | Metric | Score |
42
  |-------------|--------|
43
- | **Train Loss**| 0.470|
44
- | **Eval Loss** | 0.373 |
45
- | **Accuracy** | 86.41% |
46
- | **Precision** | 87.01% |
47
- | **Recall** | 86.49% |
48
- | **F1-score** | 86.75% |
49
-
50
- ## ⚙️ Training Parameters
51
- ```python
52
- model_name = "aubmindlab/bert-base-arabertv2"
53
- model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, classifier_dropout=0.5).to(device)
54
 
 
 
55
  training_args = TrainingArguments(
56
  output_dir="./results",
57
- evaluation_strategy="epoch",
58
- save_strategy="epoch",
59
- per_device_train_batch_size=8,
60
- per_device_eval_batch_size=8,
61
- num_train_epochs=4,
62
- weight_decay=1,
63
- learning_rate=1e-5,
64
- lr_scheduler_type="cosine",
65
- warmup_ratio=0.1,
66
  fp16=True,
67
- report_to="none",
68
  save_total_limit=2,
69
  gradient_accumulation_steps=2,
70
  load_best_model_at_end=True,
@@ -72,4 +84,60 @@ training_args = TrainingArguments(
72
  metric_for_best_model="eval_loss",
73
  greater_is_better=False,
74
  )
75
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  base_model:
13
  - aubmindlab/bert-base-arabertv02
14
  pipeline_tag: text-classification
15
+ tags:
16
+ - arabic
17
+ - sentiment-analysis
18
+ - transformers
19
+ - huggingface
20
+ - bert
21
+ - restaurants
22
+ - fine-tuning
23
+ - nlp
24
  ---
25
 
26
+ # **🍽️ Arabic Restaurant Review Sentiment Analysis 🚀**
27
+
28
+ ## **📌 Overview**
29
+ This **fine-tuned AraBERT model** classifies **Arabic restaurant reviews** as **Positive** or **Negative**.
30
+ It is based on **aubmindlab/bert-base-arabertv2** and fine-tuned using **Hugging Face Transformers**.
31
+
32
+ ### **🔥 Why This Model?**
33
+ **Trained on Real Restaurant Reviews** from the **Hugging Face Dataset**.
34
+ **Fine-tuned with Full Training** (not LoRA or Adapters).
35
+ ✅ **Balanced Dataset** (2418 Positive vs. 2418 Negative Reviews).
36
+ **High Accuracy & Performance** for Sentiment Analysis in Arabic.
37
+
38
+ ---
39
+
40
+ ## **📥 Dataset & Preprocessing**
41
+ - **Dataset Source**: [`hadyelsahar/ar_res_reviews`](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
42
+ - **Text Cleaning**:
43
+ - Removed **non-Arabic text**, special characters, and extra spaces.
44
+ - Normalized Arabic characters (`إ, أ, آ → ا`, `ة → ه`).
45
+ - Balanced **Positive & Negative** sentiment distribution.
46
  - **Tokenization**:
47
+ - Used **AraBERT tokenizer** (`aubmindlab/bert-base-arabertv2`).
48
  - **Train-Test Split**:
49
  - **80% Training** | **20% Testing**.
50
 
51
+ ---
52
+
53
+ ## **🏋️ Training & Performance**
54
+ The model was fine-tuned using **Hugging Face Transformers** with the following hyperparameters:
55
 
56
+ ### **📊 Final Model Results**
57
  | Metric | Score |
58
  |-------------|--------|
59
+ | **Train Loss** | `0.470` |
60
+ | **Eval Loss** | `0.373` |
61
+ | **Accuracy** | `86.41%` |
62
+ | **Precision** | `87.01%` |
63
+ | **Recall** | `86.49%` |
64
+ | **F1-score** | `86.75%` |
 
 
 
 
 
65
 
66
+ ### **⚙️ Training Configuration**
67
+ ```python
68
  training_args = TrainingArguments(
69
  output_dir="./results",
70
+ evaluation_strategy="epoch",
71
+ save_strategy="epoch",
72
+ per_device_train_batch_size=8,
73
+ per_device_eval_batch_size=8,
74
+ num_train_epochs=4,
75
+ weight_decay=1,
76
+ learning_rate=1e-5,
77
+ lr_scheduler_type="cosine",
78
+ warmup_ratio=0.1,
79
  fp16=True,
 
80
  save_total_limit=2,
81
  gradient_accumulation_steps=2,
82
  load_best_model_at_end=True,
 
84
  metric_for_best_model="eval_loss",
85
  greater_is_better=False,
86
  )
87
+ ```
88
+
89
+ ---
90
+
91
+ ## **💡 Usage**
92
+ ### **1️⃣ Quick Inference using `pipeline()`**
93
+ ```python
94
+ from transformers import pipeline
95
+
96
+ model_name = "Abduuu/ArabReview-Sentiment"
97
+ sentiment_pipeline = pipeline("text-classification", model=model_name)
98
+
99
+ review = "الطعام كان رائعًا والخدمة ممتازة!"
100
+ result = sentiment_pipeline(review)
101
+ print(result)
102
+ ```
103
+ ✅ **Example Output:**
104
+ ```json
105
+ [{"label": "Positive", "score": 0.96}]
106
+ ```
107
+
108
+ ---
109
+
110
+ ### **2️⃣ Use Model with `AutoModelForSequenceClassification`**
111
+ For **batch processing & lower latency**, use the `AutoModel` API:
112
+ ```python
113
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
114
+ import torch
115
+
116
+ model_name = "Abduuu/ArabReview-Sentiment"
117
+
118
+ # Load Model & Tokenizer
119
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
120
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
121
+
122
+ # Example Review
123
+ review = "الخدمة كانت بطيئة والطعام غير جيد."
124
+ inputs = tokenizer(review, return_tensors="pt")
125
+
126
+ # Perform Inference
127
+ with torch.no_grad():
128
+ logits = model(**inputs).logits
129
+ prediction = torch.argmax(logits).item()
130
+
131
+ label_map = {0: "Negative", 1: "Positive"}
132
+ print(f"Predicted Sentiment: {label_map[prediction]}")
133
+ ```
134
+
135
+ ---
136
+
137
+ ## **🔬 Model Performance (Real Examples)**
138
+ | Review | Prediction |
139
+ |--------|------------|
140
+ | "الطعام كان رائعًا والخدمة ممتازة!" | ✅ **Positive** |
141
+ | "التجربة كانت سيئة والطعام كان باردًا" | ❌ **Negative** |
142
+
143
+ ---