Abdulrahman Al-Ghamdi
Update README.md
190bd83 verified
|
raw
history blame
2.87 kB
metadata
license: apache-2.0
datasets:
  - hadyelsahar/ar_res_reviews
language:
  - ar
metrics:
  - accuracy
  - precision
  - recall
  - f1
base_model: aubmindlab/bert-base-arabertv02
pipeline_tag: text-classification
tags:
  - text-classification
  - sentiment-analysis
  - arabic
  - restaurant-reviews
model-index:
  - name: ArabReview-Sentiment
    results:
      - task:
          type: text-classification
        dataset:
          name: hadyelsahar/ar_res_reviews
          type: sentiment-analysis
        metrics:
          - name: Accuracy
            type: accuracy
            value: 86.41
          - name: Precision
            type: precision
            value: 87.01
          - name: Recall
            type: recall
            value: 86.49
          - name: F1 Score
            type: f1
            value: 86.75
library_name: transformers

🍽️ Arabic Restaurant Review Sentiment Analysis πŸš€

πŸ“Œ Overview

This project fine-tunes AraBERT to analyze sentiment in Arabic restaurant reviews.
We leveraged Hugging Face’s transformers library for training and deployed the model as an interactive pipeline.

πŸ“₯ Dataset

The dataset used for fine-tuning is from:
πŸ“‚ Arabic Restaurant Reviews Dataset
It contains restaurant reviews labeled as Positive or Negative.

πŸ”„ Preprocessing

  • Cleaning & Normalization:
    • Removed non-Arabic text, special characters, and extra spaces.
    • Normalized Arabic characters (e.g., Ψ₯, Ψ£, Ψ’ β†’ Ψ§, Ψ© β†’ Ω‡).
  • Tokenization:
    • Used AraBERT tokenizer for efficient processing.
  • Data Balancing:
    • 2,418 Positive | 2,418 Negative (Balanced Dataset).
  • Train-Test Split:
    • 80% Training | 20% Testing.

πŸ‹οΈ Fine-Tuning Details

We fine-tuned aubmindlab/bert-base-arabertv2 using full fine-tuning (not LoRA).

πŸ“Š Model Performance

Metric Score
Train Loss 0.470
Eval Loss 0.373
Accuracy 86.41%
Precision 87.01%
Recall 86.49%
F1-score 86.75%

βš™οΈ Training Parameters

model_name = "aubmindlab/bert-base-arabertv2"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, classifier_dropout=0.5).to(device)

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",       
    save_strategy="epoch",             
    per_device_train_batch_size=8,  
    per_device_eval_batch_size=8,   
    num_train_epochs=4,  
    weight_decay=1,  
    learning_rate=1e-5,  
    lr_scheduler_type="cosine",  
    warmup_ratio=0.1,  
    fp16=True,
    report_to="none",
    save_total_limit=2,
    gradient_accumulation_steps=2,
    load_best_model_at_end=True,
    max_grad_norm=1.0,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
)