Abdulrahman Al-Ghamdi commited on
Commit
6c4df32
Β·
verified Β·
1 Parent(s): 0f0d9a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -54
README.md CHANGED
@@ -1,6 +1,3 @@
1
- # Create a Markdown file with the enhanced model card content
2
-
3
-
4
  ---
5
  license: apache-2.0
6
  datasets:
@@ -15,75 +12,63 @@ metrics:
15
  base_model:
16
  - aubmindlab/bert-base-arabertv02
17
  pipeline_tag: text-classification
18
- tags:
19
- - arabic
20
- - sentiment-analysis
21
- - transformers
22
- - huggingface
23
- - bert
24
- - restaurants
25
- - fine-tuning
26
- - nlp
27
  ---
28
 
29
- # **🍽️ Arabic Restaurant Review Sentiment Analysis πŸš€**
30
-
31
- ## **πŸ“Œ Overview**
32
- This **fine-tuned AraBERT model** classifies **Arabic restaurant reviews** as **Positive** or **Negative**.
33
- It is based on **aubmindlab/bert-base-arabertv2** and fine-tuned using **Hugging Face Transformers**.
34
-
35
- ### **πŸ”₯ Why This Model?**
36
- βœ… **Trained on Real Restaurant Reviews** from the **Hugging Face Dataset**.
37
- βœ… **Fine-tuned with Full Training** (not LoRA or Adapters).
38
- βœ… **Balanced Dataset** (2418 Positive vs. 2418 Negative Reviews).
39
- βœ… **High Accuracy & Performance** for Sentiment Analysis in Arabic.
40
 
41
- ---
 
 
 
42
 
43
- ## **πŸ“₯ Dataset & Preprocessing**
44
- - **Dataset Source**: [`hadyelsahar/ar_res_reviews`](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
45
- - **Text Cleaning**:
46
- - Removed **non-Arabic text**, special characters, and extra spaces.
47
- - Normalized Arabic characters (`Ψ₯, Ψ£, Ψ’ β†’ Ψ§`, `Ψ© β†’ Ω‡`).
48
- - Balanced **Positive & Negative** sentiment distribution.
49
  - **Tokenization**:
50
- - Used **AraBERT tokenizer** (`aubmindlab/bert-base-arabertv2`).
51
  - **Train-Test Split**:
52
  - **80% Training** | **20% Testing**.
53
 
54
- ---
55
-
56
- ## **πŸ‹οΈ Training & Performance**
57
- The model was fine-tuned using **Hugging Face Transformers** with the following hyperparameters:
58
 
59
- ### **πŸ“Š Final Model Results**
60
  | Metric | Score |
61
  |-------------|--------|
62
- | **Train Loss** | `0.470` |
63
- | **Eval Loss** | `0.373` |
64
- | **Accuracy** | `86.41%` |
65
- | **Precision** | `87.01%` |
66
- | **Recall** | `86.49%` |
67
- | **F1-score** | `86.75%` |
 
 
 
 
 
68
 
69
- ### **βš™οΈ Training Configuration**
70
- ```python
71
  training_args = TrainingArguments(
72
  output_dir="./results",
73
- evaluation_strategy="epoch",
74
- save_strategy="epoch",
75
- per_device_train_batch_size=8,
76
- per_device_eval_batch_size=8,
77
- num_train_epochs=4,
78
- weight_decay=1,
79
- learning_rate=1e-5,
80
- lr_scheduler_type="cosine",
81
- warmup_ratio=0.1,
82
  fp16=True,
 
83
  save_total_limit=2,
84
  gradient_accumulation_steps=2,
85
  load_best_model_at_end=True,
86
  max_grad_norm=1.0,
87
  metric_for_best_model="eval_loss",
88
  greater_is_better=False,
89
- )
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  datasets:
 
12
  base_model:
13
  - aubmindlab/bert-base-arabertv02
14
  pipeline_tag: text-classification
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
+ # 🍽️ Arabic Restaurant Review Sentiment Analysis πŸš€
18
+ ## πŸ“Œ Overview
19
+ This project fine-tunes a **transformer-based model** to analyze sentiment in **Arabic restaurant reviews**.
20
+ We utilized **Hugging Face’s model training pipeline** and deployed the final model as an **interactive Gradio web app**.
 
 
 
 
 
 
 
21
 
22
+ ## πŸ“₯ Data Collection
23
+ The dataset used for fine-tuning was sourced from **Hugging Face Datasets**, specifically:
24
+ [πŸ“‚ Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
25
+ It contains **restaurant reviews in Arabic** labeled with sentiment polarity.
26
 
27
+ ## πŸ”„ Data Preparation
28
+ - **Cleaning & Normalization**:
29
+ - Removed non-Arabic text, special characters, and extra spaces.
30
+ - Normalized Arabic characters (e.g., Ψ₯, Ψ£, Ψ’ β†’ Ψ§, Ψ© β†’ Ω‡).
31
+ - Downsampled positive reviews to balance the dataset.
 
32
  - **Tokenization**:
33
+ - Used **AraBERT tokenizer** for efficient text processing.
34
  - **Train-Test Split**:
35
  - **80% Training** | **20% Testing**.
36
 
37
+ ## πŸ‹οΈ Fine-Tuning & Results
38
+ The model was fine-tuned using **Hugging Face Transformers** on a dataset of restaurant reviews.
 
 
39
 
40
+ ### **πŸ“Š Evaluation Metrics**
41
  | Metric | Score |
42
  |-------------|--------|
43
+ | **Train Loss**| 0.470|
44
+ | **Eval Loss** | 0.373 |
45
+ | **Accuracy** | 86.41% |
46
+ | **Precision** | 87.01% |
47
+ | **Recall** | 86.49% |
48
+ | **F1-score** | 86.75% |
49
+
50
+ ## βš™οΈ Training Parameters
51
+ python
52
+ model_name = "aubmindlab/bert-base-arabertv2"
53
+ model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, classifier_dropout=0.5).to(device)
54
 
 
 
55
  training_args = TrainingArguments(
56
  output_dir="./results",
57
+ evaluation_strategy="epoch",
58
+ save_strategy="epoch",
59
+ per_device_train_batch_size=8,
60
+ per_device_eval_batch_size=8,
61
+ num_train_epochs=4,
62
+ weight_decay=1,
63
+ learning_rate=1e-5,
64
+ lr_scheduler_type="cosine",
65
+ warmup_ratio=0.1,
66
  fp16=True,
67
+ report_to="none",
68
  save_total_limit=2,
69
  gradient_accumulation_steps=2,
70
  load_best_model_at_end=True,
71
  max_grad_norm=1.0,
72
  metric_for_best_model="eval_loss",
73
  greater_is_better=False,
74
+ )