Abdulrahman Al-Ghamdi commited on
Commit
64f6506
ยท
verified ยท
1 Parent(s): 064615c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -22
README.md CHANGED
@@ -1,52 +1,80 @@
1
  ---
2
  license: apache-2.0
3
  datasets:
4
- - hadyelsahar/ar_res_reviews
5
  language:
6
- - ar
7
  metrics:
8
- - accuracy
9
- - precision
10
- - recall
11
- - f1
12
- base_model:
13
- - aubmindlab/bert-base-arabertv02
14
  pipeline_tag: text-classification
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
  # ๐Ÿฝ๏ธ Arabic Restaurant Review Sentiment Analysis ๐Ÿš€
18
  ## ๐Ÿ“Œ Overview
19
- This project fine-tunes a **transformer-based model** to analyze sentiment in **Arabic restaurant reviews**.
20
- We utilized **Hugging Faceโ€™s model training pipeline** and deployed the final model as an **interactive Gradio web app**.
21
 
22
- ## ๐Ÿ“ฅ Data Collection
23
- The dataset used for fine-tuning was sourced from **Hugging Face Datasets**, specifically:
24
  [๐Ÿ“‚ Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
25
- It contains **restaurant reviews in Arabic** labeled with sentiment polarity.
26
 
27
- ## ๐Ÿ”„ Data Preparation
28
  - **Cleaning & Normalization**:
29
- - Removed non-Arabic text, special characters, and extra spaces.
30
- - Normalized Arabic characters (e.g., `ุฅ, ุฃ, ุข โ†’ ุง`, `ุฉ โ†’ ู‡`).
31
- - Downsampled positive reviews to balance the dataset.
32
  - **Tokenization**:
33
- - Used **AraBERT tokenizer** for efficient text processing.
 
 
34
  - **Train-Test Split**:
35
  - **80% Training** | **20% Testing**.
36
 
37
- ## ๐Ÿ‹๏ธ Fine-Tuning & Results
38
- The model was fine-tuned using **Hugging Face Transformers** on a dataset of restaurant reviews.
39
 
40
- ### **๐Ÿ“Š Evaluation Metrics**
41
  | Metric | Score |
42
  |-------------|--------|
43
- | **Train Loss**| `0.470`|
44
  | **Eval Loss** | `0.373` |
45
  | **Accuracy** | `86.41%` |
46
  | **Precision** | `87.01%` |
47
  | **Recall** | `86.49%` |
48
  | **F1-score** | `86.75%` |
49
 
 
 
50
  ## โš™๏ธ Training Parameters
51
  ```python
52
  model_name = "aubmindlab/bert-base-arabertv2"
 
1
  ---
2
  license: apache-2.0
3
  datasets:
4
+ - hadyelsahar/ar_res_reviews
5
  language:
6
+ - ar
7
  metrics:
8
+ - accuracy
9
+ - precision
10
+ - recall
11
+ - f1
12
+ base_model: aubmindlab/bert-base-arabertv02
 
13
  pipeline_tag: text-classification
14
+ tags:
15
+ - text-classification
16
+ - sentiment-analysis
17
+ - arabic
18
+ - restaurant-reviews
19
+ model-index:
20
+ - name: ArabReview-Sentiment
21
+ results:
22
+ - task:
23
+ type: text-classification
24
+ dataset:
25
+ name: hadyelsahar/ar_res_reviews
26
+ type: sentiment-analysis
27
+ metrics:
28
+ - name: Accuracy
29
+ type: accuracy
30
+ value: 86.41
31
+ - name: Precision
32
+ type: precision
33
+ value: 87.01
34
+ - name: Recall
35
+ type: recall
36
+ value: 86.49
37
+ - name: F1 Score
38
+ type: f1
39
+ value: 86.75
40
  ---
41
 
42
  # ๐Ÿฝ๏ธ Arabic Restaurant Review Sentiment Analysis ๐Ÿš€
43
  ## ๐Ÿ“Œ Overview
44
+ This project fine-tunes **AraBERT** to analyze sentiment in **Arabic restaurant reviews**.
45
+ We leveraged **Hugging Faceโ€™s `transformers` library** for training and deployed the model as an **interactive pipeline**.
46
 
47
+ ## ๐Ÿ“ฅ Dataset
48
+ The dataset used for fine-tuning is from:
49
  [๐Ÿ“‚ Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
50
+ It contains restaurant reviews labeled as **Positive** or **Negative**.
51
 
52
+ ## ๐Ÿ”„ Preprocessing
53
  - **Cleaning & Normalization**:
54
+ - Removed **non-Arabic** text, special characters, and extra spaces.
55
+ - **Normalized Arabic characters** (e.g., `ุฅ, ุฃ, ุข โ†’ ุง`, `ุฉ โ†’ ู‡`).
 
56
  - **Tokenization**:
57
+ - Used **AraBERT tokenizer** for efficient processing.
58
+ - **Data Balancing**:
59
+ - 2,418 **Positive** | 2,418 **Negative** (Balanced Dataset).
60
  - **Train-Test Split**:
61
  - **80% Training** | **20% Testing**.
62
 
63
+ ## ๐Ÿ‹๏ธ Fine-Tuning Details
64
+ We fine-tuned **`aubmindlab/bert-base-arabertv2`** using full fine-tuning (not LoRA).
65
 
66
+ ### **๐Ÿ“Š Model Performance**
67
  | Metric | Score |
68
  |-------------|--------|
69
+ | **Train Loss**| `0.470` |
70
  | **Eval Loss** | `0.373` |
71
  | **Accuracy** | `86.41%` |
72
  | **Precision** | `87.01%` |
73
  | **Recall** | `86.49%` |
74
  | **F1-score** | `86.75%` |
75
 
76
+ ---
77
+
78
  ## โš™๏ธ Training Parameters
79
  ```python
80
  model_name = "aubmindlab/bert-base-arabertv2"