--- license: apache-2.0 datasets: - hadyelsahar/ar_res_reviews language: - ar metrics: - accuracy - precision - recall - f1 base_model: aubmindlab/bert-base-arabertv02 pipeline_tag: text-classification tags: - text-classification - sentiment-analysis - arabic - restaurant-reviews model-index: - name: ArabReview-Sentiment results: - task: type: text-classification dataset: name: hadyelsahar/ar_res_reviews type: sentiment-analysis metrics: - name: Accuracy type: accuracy value: 86.41 - name: Precision type: precision value: 87.01 - name: Recall type: recall value: 86.49 - name: F1 Score type: f1 value: 86.75 library_name: transformers --- # 🍽️ Arabic Restaurant Review Sentiment Analysis 🚀 ## 📌 Overview This project fine-tunes **AraBERT** to analyze sentiment in **Arabic restaurant reviews**. We leveraged **Hugging Face’s `transformers` library** for training and deployed the model as an **interactive pipeline**. ## 📥 Dataset The dataset used for fine-tuning is from: [📂 Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews) It contains restaurant reviews labeled as **Positive** or **Negative**. ## 🔄 Preprocessing - **Cleaning & Normalization**: - Removed **non-Arabic** text, special characters, and extra spaces. - **Normalized Arabic characters** (e.g., `إ, أ, آ → ا`, `ة → ه`). - **Tokenization**: - Used **AraBERT tokenizer** for efficient processing. - **Data Balancing**: - 2,418 **Positive** | 2,418 **Negative** (Balanced Dataset). - **Train-Test Split**: - **80% Training** | **20% Testing**. ## 🏋️ Fine-Tuning Details We fine-tuned **`aubmindlab/bert-base-arabertv2`** using full fine-tuning (not LoRA). ### **📊 Model Performance** | Metric | Score | |-------------|--------| | **Train Loss**| `0.470` | | **Eval Loss** | `0.373` | | **Accuracy** | `86.41%` | | **Precision** | `87.01%` | | **Recall** | `86.49%` | | **F1-score** | `86.75%` | --- ## ⚙️ Training Parameters ```python model_name = "aubmindlab/bert-base-arabertv2" model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, classifier_dropout=0.5).to(device) training_args = TrainingArguments( output_dir="./results", evaluation_strategy="epoch", save_strategy="epoch", per_device_train_batch_size=8, per_device_eval_batch_size=8, num_train_epochs=4, weight_decay=1, learning_rate=1e-5, lr_scheduler_type="cosine", warmup_ratio=0.1, fp16=True, report_to="none", save_total_limit=2, gradient_accumulation_steps=2, load_best_model_at_end=True, max_grad_norm=1.0, metric_for_best_model="eval_loss", greater_is_better=False, )