Abdulrahman Al-Ghamdi
Update README.md
0ad19d7 verified
|
raw
history blame
4.26 kB
---
license: apache-2.0
datasets:
- hadyelsahar/ar_res_reviews
language:
- ar
metrics:
- accuracy
- precision
- recall
- f1
base_model:
- aubmindlab/bert-base-arabertv02
pipeline_tag: text-classification
tags:
- arabic
- sentiment-analysis
- transformers
- huggingface
- bert
- restaurants
- fine-tuning
- nlp
---
# **🍽️ Arabic Restaurant Review Sentiment Analysis 🚀**
## **📌 Overview**
This **fine-tuned AraBERT model** classifies **Arabic restaurant reviews** as **Positive** or **Negative**.
It is based on **aubmindlab/bert-base-arabertv2** and fine-tuned using **Hugging Face Transformers**.
### **🔥 Why This Model?**
**Trained on Real Restaurant Reviews** from the **Hugging Face Dataset**.
**Fine-tuned with Full Training** (not LoRA or Adapters).
**Balanced Dataset** (2418 Positive vs. 2418 Negative Reviews).
**High Accuracy & Performance** for Sentiment Analysis in Arabic.
**🚀 Try the Model Now!**
<p align="center">
<a href="https://huggingface.co/spaces/Abduuu/Arabic-Reviews-Sentiment-Analysis">
<img src="https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-lg-dark.svg" alt="Open in HF Spaces" width="250px">
</a>
</p>
---
## **📥 Dataset & Preprocessing**
- **Dataset Source**: [`hadyelsahar/ar_res_reviews`](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
- **Text Cleaning**:
- Removed **non-Arabic text**, special characters, and extra spaces.
- Normalized Arabic characters (`إ, أ, آ → ا`, `ة → ه`).
- Balanced **Positive & Negative** sentiment distribution.
- **Tokenization**:
- Used **AraBERT tokenizer** (`aubmindlab/bert-base-arabertv2`).
- **Train-Test Split**:
- **80% Training** | **20% Testing**.
---
## **🏋️ Training & Performance**
The model was fine-tuned using **Hugging Face Transformers** with the following hyperparameters:
### **📊 Final Model Results**
| Metric | Score |
|-------------|--------|
| **Train Loss** | `0.470` |
| **Eval Loss** | `0.373` |
| **Accuracy** | `86.41%` |
| **Precision** | `87.01%` |
| **Recall** | `86.49%` |
| **F1-score** | `86.75%` |
### **⚙️ Training Configuration**
```python
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
save_strategy="epoch",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=4,
weight_decay=1,
learning_rate=1e-5,
lr_scheduler_type="cosine",
warmup_ratio=0.1,
fp16=True,
save_total_limit=2,
gradient_accumulation_steps=2,
load_best_model_at_end=True,
max_grad_norm=1.0,
metric_for_best_model="eval_loss",
greater_is_better=False,
)
```
---
## **💡 Usage**
### **1️⃣ Quick Inference using `pipeline()`**
```python
from transformers import pipeline
model_name = "Abduuu/ArabReview-Sentiment"
sentiment_pipeline = pipeline("text-classification", model=model_name)
review = "الطعام كان رائعًا والخدمة ممتازة!"
result = sentiment_pipeline(review)
print(result)
```
**Example Output:**
```json
[{'label': 'Positive', 'score': 0.91551274061203}]]
```
---
### **2️⃣ Use Model with `AutoModelForSequenceClassification`**
For **batch processing & lower latency**, use the `AutoModel` API:
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model_name = "Abduuu/ArabReview-Sentiment"
# Load Model & Tokenizer
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Example Review
review = "الخدمة كانت بطيئة والطعام غير جيد."
inputs = tokenizer(review, return_tensors="pt")
# Perform Inference
with torch.no_grad():
logits = model(**inputs).logits
prediction = torch.argmax(logits).item()
label_map = {0: "Negative", 1: "Positive"}
print(f"Predicted Sentiment: {label_map[prediction]}")
```
---
## **🔬 Model Performance (Real Examples)**
| Review | Prediction |
|--------|------------|
| "الطعام كان رائعًا والخدمة ممتازة!" | ✅ **Positive** |
| "التجربة كانت سيئة والطعام كان باردًا" | ❌ **Negative** |
---