Abdulrahman Al-Ghamdi commited on
Commit
7eb3e22
Β·
verified Β·
1 Parent(s): 190bd83

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -67
README.md CHANGED
@@ -1,3 +1,6 @@
 
 
 
1
  ---
2
  license: apache-2.0
3
  datasets:
@@ -9,95 +12,78 @@ metrics:
9
  - precision
10
  - recall
11
  - f1
12
- base_model: aubmindlab/bert-base-arabertv02
 
13
  pipeline_tag: text-classification
14
  tags:
15
- - text-classification
16
- - sentiment-analysis
17
  - arabic
18
- - restaurant-reviews
19
- model-index:
20
- - name: ArabReview-Sentiment
21
- results:
22
- - task:
23
- type: text-classification
24
- dataset:
25
- name: hadyelsahar/ar_res_reviews
26
- type: sentiment-analysis
27
- metrics:
28
- - name: Accuracy
29
- type: accuracy
30
- value: 86.41
31
- - name: Precision
32
- type: precision
33
- value: 87.01
34
- - name: Recall
35
- type: recall
36
- value: 86.49
37
- - name: F1 Score
38
- type: f1
39
- value: 86.75
40
- library_name: transformers
41
  ---
42
 
43
- # 🍽️ Arabic Restaurant Review Sentiment Analysis πŸš€
44
- ## πŸ“Œ Overview
45
- This project fine-tunes **AraBERT** to analyze sentiment in **Arabic restaurant reviews**.
46
- We leveraged **Hugging Face’s `transformers` library** for training and deployed the model as an **interactive pipeline**.
47
 
48
- ## πŸ“₯ Dataset
49
- The dataset used for fine-tuning is from:
50
- [πŸ“‚ Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
51
- It contains restaurant reviews labeled as **Positive** or **Negative**.
 
 
 
 
 
 
 
52
 
53
- ## πŸ”„ Preprocessing
54
- - **Cleaning & Normalization**:
55
- - Removed **non-Arabic** text, special characters, and extra spaces.
56
- - **Normalized Arabic characters** (e.g., `Ψ₯, Ψ£, Ψ’ β†’ Ψ§`, `Ψ© β†’ Ω‡`).
 
 
57
  - **Tokenization**:
58
- - Used **AraBERT tokenizer** for efficient processing.
59
- - **Data Balancing**:
60
- - 2,418 **Positive** | 2,418 **Negative** (Balanced Dataset).
61
  - **Train-Test Split**:
62
  - **80% Training** | **20% Testing**.
63
 
64
- ## πŸ‹οΈ Fine-Tuning Details
65
- We fine-tuned **`aubmindlab/bert-base-arabertv2`** using full fine-tuning (not LoRA).
 
 
66
 
67
- ### **πŸ“Š Model Performance**
68
  | Metric | Score |
69
  |-------------|--------|
70
- | **Train Loss**| `0.470` |
71
- | **Eval Loss** | `0.373` |
72
- | **Accuracy** | `86.41%` |
73
- | **Precision** | `87.01%` |
74
- | **Recall** | `86.49%` |
75
- | **F1-score** | `86.75%` |
76
 
77
- ---
78
-
79
- ## βš™οΈ Training Parameters
80
  ```python
81
- model_name = "aubmindlab/bert-base-arabertv2"
82
- model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, classifier_dropout=0.5).to(device)
83
-
84
  training_args = TrainingArguments(
85
  output_dir="./results",
86
- evaluation_strategy="epoch",
87
- save_strategy="epoch",
88
- per_device_train_batch_size=8,
89
- per_device_eval_batch_size=8,
90
- num_train_epochs=4,
91
- weight_decay=1,
92
- learning_rate=1e-5,
93
- lr_scheduler_type="cosine",
94
- warmup_ratio=0.1,
95
  fp16=True,
96
- report_to="none",
97
  save_total_limit=2,
98
  gradient_accumulation_steps=2,
99
  load_best_model_at_end=True,
100
  max_grad_norm=1.0,
101
  metric_for_best_model="eval_loss",
102
  greater_is_better=False,
103
- )
 
1
+ # Create a Markdown file with the enhanced model card content
2
+
3
+ model_card_content = """\
4
  ---
5
  license: apache-2.0
6
  datasets:
 
12
  - precision
13
  - recall
14
  - f1
15
+ base_model:
16
+ - aubmindlab/bert-base-arabertv02
17
  pipeline_tag: text-classification
18
  tags:
 
 
19
  - arabic
20
+ - sentiment-analysis
21
+ - transformers
22
+ - huggingface
23
+ - bert
24
+ - restaurants
25
+ - fine-tuning
26
+ - nlp
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ---
28
 
29
+ # **🍽️ Arabic Restaurant Review Sentiment Analysis πŸš€**
 
 
 
30
 
31
+ ## **πŸ“Œ Overview**
32
+ This **fine-tuned AraBERT model** classifies **Arabic restaurant reviews** as **Positive** or **Negative**.
33
+ It is based on **aubmindlab/bert-base-arabertv2** and fine-tuned using **Hugging Face Transformers**.
34
+
35
+ ### **πŸ”₯ Why This Model?**
36
+ βœ… **Trained on Real Restaurant Reviews** from the **Hugging Face Dataset**.
37
+ βœ… **Fine-tuned with Full Training** (not LoRA or Adapters).
38
+ βœ… **Balanced Dataset** (2418 Positive vs. 2418 Negative Reviews).
39
+ βœ… **High Accuracy & Performance** for Sentiment Analysis in Arabic.
40
+
41
+ ---
42
 
43
+ ## **πŸ“₯ Dataset & Preprocessing**
44
+ - **Dataset Source**: [`hadyelsahar/ar_res_reviews`](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
45
+ - **Text Cleaning**:
46
+ - Removed **non-Arabic text**, special characters, and extra spaces.
47
+ - Normalized Arabic characters (`Ψ₯, Ψ£, Ψ’ β†’ Ψ§`, `Ψ© β†’ Ω‡`).
48
+ - Balanced **Positive & Negative** sentiment distribution.
49
  - **Tokenization**:
50
+ - Used **AraBERT tokenizer** (`aubmindlab/bert-base-arabertv2`).
 
 
51
  - **Train-Test Split**:
52
  - **80% Training** | **20% Testing**.
53
 
54
+ ---
55
+
56
+ ## **πŸ‹οΈ Training & Performance**
57
+ The model was fine-tuned using **Hugging Face Transformers** with the following hyperparameters:
58
 
59
+ ### **πŸ“Š Final Model Results**
60
  | Metric | Score |
61
  |-------------|--------|
62
+ | **Train Loss** | `0.470` |
63
+ | **Eval Loss** | `0.373` |
64
+ | **Accuracy** | `86.41%` |
65
+ | **Precision** | `87.01%` |
66
+ | **Recall** | `86.49%` |
67
+ | **F1-score** | `86.75%` |
68
 
69
+ ### **βš™οΈ Training Configuration**
 
 
70
  ```python
 
 
 
71
  training_args = TrainingArguments(
72
  output_dir="./results",
73
+ evaluation_strategy="epoch",
74
+ save_strategy="epoch",
75
+ per_device_train_batch_size=8,
76
+ per_device_eval_batch_size=8,
77
+ num_train_epochs=4,
78
+ weight_decay=1,
79
+ learning_rate=1e-5,
80
+ lr_scheduler_type="cosine",
81
+ warmup_ratio=0.1,
82
  fp16=True,
 
83
  save_total_limit=2,
84
  gradient_accumulation_steps=2,
85
  load_best_model_at_end=True,
86
  max_grad_norm=1.0,
87
  metric_for_best_model="eval_loss",
88
  greater_is_better=False,
89
+ )