Abdulrahman Al-Ghamdi
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -13,3 +13,53 @@ base_model:
|
|
13 |
- aubmindlab/bert-base-arabertv02
|
14 |
pipeline_tag: text-classification
|
15 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
- aubmindlab/bert-base-arabertv02
|
14 |
pipeline_tag: text-classification
|
15 |
---
|
16 |
+
|
17 |
+
# ๐ฝ๏ธ Arabic Restaurant Review Sentiment Analysis ๐
|
18 |
+
|
19 |
+
## ๐ Overview
|
20 |
+
This project fine-tunes a **transformer-based model** to analyze sentiment in **Arabic restaurant reviews**.
|
21 |
+
We utilized **Hugging Faceโs model training pipeline** and deployed the final model as an **interactive Gradio web app**.
|
22 |
+
|
23 |
+
## ๐ฅ Data Collection
|
24 |
+
The dataset used for fine-tuning was sourced from **Hugging Face Datasets**, specifically:
|
25 |
+
[๐ Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
|
26 |
+
It contains **restaurant reviews in Arabic** labeled with sentiment polarity.
|
27 |
+
|
28 |
+
## ๐ Data Preparation
|
29 |
+
- **Cleaning & Normalization**:
|
30 |
+
- Removed non-Arabic text, special characters, and extra spaces.
|
31 |
+
- Normalized Arabic characters (e.g., `ุฅ, ุฃ, ุข โ ุง`, `ุฉ โ ู`).
|
32 |
+
- Downsampled positive reviews to balance the dataset.
|
33 |
+
- **Tokenization**:
|
34 |
+
- Used **AraBERT tokenizer** for efficient text processing.
|
35 |
+
- **Train-Test Split**:
|
36 |
+
- **80% Training** | **20% Testing**.
|
37 |
+
|
38 |
+
## ๐๏ธ Fine-Tuning & Results
|
39 |
+
The model was fine-tuned using **Hugging Face Transformers** on a dataset of restaurant reviews.
|
40 |
+
|
41 |
+
### **๐ Evaluation Metrics**
|
42 |
+
| Metric | Score |
|
43 |
+
|-------------|--------|
|
44 |
+
| **Eval Loss** | `0.5665` |
|
45 |
+
| **Accuracy** | `70.37%` |
|
46 |
+
| **Precision** | `70.36%` |
|
47 |
+
| **Recall** | `70.37%` |
|
48 |
+
| **F1-score** | `69.75%` |
|
49 |
+
| **Eval Runtime** | `11.5 sec` |
|
50 |
+
|
51 |
+
## โ๏ธ Training Parameters
|
52 |
+
```python
|
53 |
+
training_args = TrainingArguments(
|
54 |
+
output_dir="./results",
|
55 |
+
evaluation_strategy="steps",
|
56 |
+
eval_steps=200,
|
57 |
+
per_device_train_batch_size=2,
|
58 |
+
per_device_eval_batch_size=2,
|
59 |
+
num_train_epochs=5,
|
60 |
+
weight_decay=0.01,
|
61 |
+
learning_rate=3e-5,
|
62 |
+
logging_steps=100,
|
63 |
+
fp16=True,
|
64 |
+
report_to="none"
|
65 |
+
)
|