Abdulrahman Al-Ghamdi
Update README.md
0ad19d7 verified
|
raw
history blame
4.26 kB
metadata
license: apache-2.0
datasets:
  - hadyelsahar/ar_res_reviews
language:
  - ar
metrics:
  - accuracy
  - precision
  - recall
  - f1
base_model:
  - aubmindlab/bert-base-arabertv02
pipeline_tag: text-classification
tags:
  - arabic
  - sentiment-analysis
  - transformers
  - huggingface
  - bert
  - restaurants
  - fine-tuning
  - nlp

๐Ÿฝ๏ธ Arabic Restaurant Review Sentiment Analysis ๐Ÿš€

๐Ÿ“Œ Overview

This fine-tuned AraBERT model classifies Arabic restaurant reviews as Positive or Negative.
It is based on aubmindlab/bert-base-arabertv2 and fine-tuned using Hugging Face Transformers.

๐Ÿ”ฅ Why This Model?

โœ… Trained on Real Restaurant Reviews from the Hugging Face Dataset.
โœ… Fine-tuned with Full Training (not LoRA or Adapters).
โœ… Balanced Dataset (2418 Positive vs. 2418 Negative Reviews).
โœ… High Accuracy & Performance for Sentiment Analysis in Arabic.

๐Ÿš€ Try the Model Now!

Open in HF Spaces

๐Ÿ“ฅ Dataset & Preprocessing

  • Dataset Source: hadyelsahar/ar_res_reviews
  • Text Cleaning:
    • Removed non-Arabic text, special characters, and extra spaces.
    • Normalized Arabic characters (ุฅ, ุฃ, ุข โ†’ ุง, ุฉ โ†’ ู‡).
    • Balanced Positive & Negative sentiment distribution.
  • Tokenization:
    • Used AraBERT tokenizer (aubmindlab/bert-base-arabertv2).
  • Train-Test Split:
    • 80% Training | 20% Testing.

๐Ÿ‹๏ธ Training & Performance

The model was fine-tuned using Hugging Face Transformers with the following hyperparameters:

๐Ÿ“Š Final Model Results

Metric Score
Train Loss 0.470
Eval Loss 0.373
Accuracy 86.41%
Precision 87.01%
Recall 86.49%
F1-score 86.75%

โš™๏ธ Training Configuration

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=4,
    weight_decay=1,
    learning_rate=1e-5,
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,
    fp16=True,
    save_total_limit=2,
    gradient_accumulation_steps=2,
    load_best_model_at_end=True,
    max_grad_norm=1.0,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
)

๐Ÿ’ก Usage

1๏ธโƒฃ Quick Inference using pipeline()

from transformers import pipeline

model_name = "Abduuu/ArabReview-Sentiment"
sentiment_pipeline = pipeline("text-classification", model=model_name)

review = "ุงู„ุทุนุงู… ูƒุงู† ุฑุงุฆุนู‹ุง ูˆุงู„ุฎุฏู…ุฉ ู…ู…ุชุงุฒุฉ!"
result = sentiment_pipeline(review)
print(result)

โœ… Example Output:

[{'label': 'Positive', 'score': 0.91551274061203}]]

2๏ธโƒฃ Use Model with AutoModelForSequenceClassification

For batch processing & lower latency, use the AutoModel API:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_name = "Abduuu/ArabReview-Sentiment"

# Load Model & Tokenizer
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Example Review
review = "ุงู„ุฎุฏู…ุฉ ูƒุงู†ุช ุจุทูŠุฆุฉ ูˆุงู„ุทุนุงู… ุบูŠุฑ ุฌูŠุฏ."
inputs = tokenizer(review, return_tensors="pt")

# Perform Inference
with torch.no_grad():
    logits = model(**inputs).logits
    prediction = torch.argmax(logits).item()

label_map = {0: "Negative", 1: "Positive"}
print(f"Predicted Sentiment: {label_map[prediction]}")

๐Ÿ”ฌ Model Performance (Real Examples)

Review Prediction
"ุงู„ุทุนุงู… ูƒุงู† ุฑุงุฆุนู‹ุง ูˆุงู„ุฎุฏู…ุฉ ู…ู…ุชุงุฒุฉ!" โœ… Positive
"ุงู„ุชุฌุฑุจุฉ ูƒุงู†ุช ุณูŠุฆุฉ ูˆุงู„ุทุนุงู… ูƒุงู† ุจุงุฑุฏู‹ุง" โŒ Negative