Abdulrahman Al-Ghamdi

Update README.md

0ad19d7 verified 17 days ago

4.26 kB

	---
	license: apache-2.0
	datasets:
	- hadyelsahar/ar_res_reviews
	language:
	- ar
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	base_model:
	- aubmindlab/bert-base-arabertv02
	pipeline_tag: text-classification
	tags:
	- arabic
	- sentiment-analysis
	- transformers
	- huggingface
	- bert
	- restaurants
	- fine-tuning
	- nlp
	---

	# 🍽️ Arabic Restaurant Review Sentiment Analysis 🚀

	## 📌 Overview
	This fine-tuned AraBERT model classifies Arabic restaurant reviews as Positive or Negative.
	It is based on aubmindlab/bert-base-arabertv2 and fine-tuned using Hugging Face Transformers.

	### 🔥 Why This Model?
	✅ Trained on Real Restaurant Reviews from the Hugging Face Dataset.
	✅ Fine-tuned with Full Training (not LoRA or Adapters).
	✅ Balanced Dataset (2418 Positive vs. 2418 Negative Reviews).
	✅ High Accuracy & Performance for Sentiment Analysis in Arabic.

	🚀 Try the Model Now!
	<p align="center">
	<a href="https://huggingface.co/spaces/Abduuu/Arabic-Reviews-Sentiment-Analysis">
	<img src="https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-lg-dark.svg" alt="Open in HF Spaces" width="250px">
	</a>
	</p>
	---

	## 📥 Dataset & Preprocessing
	- Dataset Source: [`hadyelsahar/ar_res_reviews`](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
	- Text Cleaning:
	- Removed non-Arabic text, special characters, and extra spaces.
	- Normalized Arabic characters (`إ, أ, آ → ا`, `ة → ه`).
	- Balanced Positive & Negative sentiment distribution.
	- Tokenization:
	- Used AraBERT tokenizer (`aubmindlab/bert-base-arabertv2`).
	- Train-Test Split:
	- 80% Training \| 20% Testing.

	---

	## 🏋️ Training & Performance
	The model was fine-tuned using Hugging Face Transformers with the following hyperparameters:

	### 📊 Final Model Results
	\| Metric \| Score \|
	\|-------------\|--------\|
	\| Train Loss \| `0.470` \|
	\| Eval Loss \| `0.373` \|
	\| Accuracy \| `86.41%` \|
	\| Precision \| `87.01%` \|
	\| Recall \| `86.49%` \|
	\| F1-score \| `86.75%` \|

	### ⚙️ Training Configuration
	```python
	training_args = TrainingArguments(
	output_dir="./results",
	evaluation_strategy="epoch",
	save_strategy="epoch",
	per_device_train_batch_size=8,
	per_device_eval_batch_size=8,
	num_train_epochs=4,
	weight_decay=1,
	learning_rate=1e-5,
	lr_scheduler_type="cosine",
	warmup_ratio=0.1,
	fp16=True,
	save_total_limit=2,
	gradient_accumulation_steps=2,
	load_best_model_at_end=True,
	max_grad_norm=1.0,
	metric_for_best_model="eval_loss",
	greater_is_better=False,
	)
	```

	---

	## 💡 Usage
	### 1️⃣ Quick Inference using `pipeline()`
	```python
	from transformers import pipeline

	model_name = "Abduuu/ArabReview-Sentiment"
	sentiment_pipeline = pipeline("text-classification", model=model_name)

	review = "الطعام كان رائعًا والخدمة ممتازة!"
	result = sentiment_pipeline(review)
	print(result)
	```
	✅ Example Output:
	```json
	[{'label': 'Positive', 'score': 0.91551274061203}]]
	```

	---

	### 2️⃣ Use Model with `AutoModelForSequenceClassification`
	For batch processing & lower latency, use the `AutoModel` API:
	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	model_name = "Abduuu/ArabReview-Sentiment"

	# Load Model & Tokenizer
	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	# Example Review
	review = "الخدمة كانت بطيئة والطعام غير جيد."
	inputs = tokenizer(review, return_tensors="pt")

	# Perform Inference
	with torch.no_grad():
	logits = model(**inputs).logits
	prediction = torch.argmax(logits).item()

	label_map = {0: "Negative", 1: "Positive"}
	print(f"Predicted Sentiment: {label_map[prediction]}")
	```

	---

	## 🔬 Model Performance (Real Examples)
	\| Review \| Prediction \|
	\|--------\|------------\|
	\| "الطعام كان رائعًا والخدمة ممتازة!" \| ✅ Positive \|
	\| "التجربة كانت سيئة والطعام كان باردًا" \| ❌ Negative \|

	---