Abdulrahman Al-Ghamdi
Update README.md
7eb3e22 verified
|
raw
history blame
2.52 kB

Create a Markdown file with the enhanced model card content

model_card_content = """\

license: apache-2.0 datasets: - hadyelsahar/ar_res_reviews language: - ar metrics: - accuracy - precision - recall - f1 base_model: - aubmindlab/bert-base-arabertv02 pipeline_tag: text-classification tags: - arabic - sentiment-analysis - transformers - huggingface - bert - restaurants - fine-tuning - nlp

🍽️ Arabic Restaurant Review Sentiment Analysis πŸš€

πŸ“Œ Overview

This fine-tuned AraBERT model classifies Arabic restaurant reviews as Positive or Negative.
It is based on aubmindlab/bert-base-arabertv2 and fine-tuned using Hugging Face Transformers.

πŸ”₯ Why This Model?

βœ… Trained on Real Restaurant Reviews from the Hugging Face Dataset.
βœ… Fine-tuned with Full Training (not LoRA or Adapters).
βœ… Balanced Dataset (2418 Positive vs. 2418 Negative Reviews).
βœ… High Accuracy & Performance for Sentiment Analysis in Arabic.


πŸ“₯ Dataset & Preprocessing

  • Dataset Source: hadyelsahar/ar_res_reviews
  • Text Cleaning:
    • Removed non-Arabic text, special characters, and extra spaces.
    • Normalized Arabic characters (Ψ₯, Ψ£, Ψ’ β†’ Ψ§, Ψ© β†’ Ω‡).
    • Balanced Positive & Negative sentiment distribution.
  • Tokenization:
    • Used AraBERT tokenizer (aubmindlab/bert-base-arabertv2).
  • Train-Test Split:
    • 80% Training | 20% Testing.

πŸ‹οΈ Training & Performance

The model was fine-tuned using Hugging Face Transformers with the following hyperparameters:

πŸ“Š Final Model Results

Metric Score
Train Loss 0.470
Eval Loss 0.373
Accuracy 86.41%
Precision 87.01%
Recall 86.49%
F1-score 86.75%

βš™οΈ Training Configuration

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=4,
    weight_decay=1,
    learning_rate=1e-5,
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,
    fp16=True,
    save_total_limit=2,
    gradient_accumulation_steps=2,
    load_best_model_at_end=True,
    max_grad_norm=1.0,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
)