Abdulrahman Al-Ghamdi
Update README.md
6f67103 verified
|
raw
history blame
1.94 kB
---
license: apache-2.0
datasets:
- hadyelsahar/ar_res_reviews
language:
- ar
metrics:
- accuracy
- precision
- recall
- f1
base_model:
- aubmindlab/bert-base-arabertv02
pipeline_tag: text-classification
---
# ๐Ÿฝ๏ธ Arabic Restaurant Review Sentiment Analysis ๐Ÿš€
**Model Is Under Development**
## ๐Ÿ“Œ Overview
This project fine-tunes a **transformer-based model** to analyze sentiment in **Arabic restaurant reviews**.
We utilized **Hugging Faceโ€™s model training pipeline** and deployed the final model as an **interactive Gradio web app**.
## ๐Ÿ“ฅ Data Collection
The dataset used for fine-tuning was sourced from **Hugging Face Datasets**, specifically:
[๐Ÿ“‚ Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
It contains **restaurant reviews in Arabic** labeled with sentiment polarity.
## ๐Ÿ”„ Data Preparation
- **Cleaning & Normalization**:
- Removed non-Arabic text, special characters, and extra spaces.
- Normalized Arabic characters (e.g., `ุฅ, ุฃ, ุข โ†’ ุง`, `ุฉ โ†’ ู‡`).
- Downsampled positive reviews to balance the dataset.
- **Tokenization**:
- Used **AraBERT tokenizer** for efficient text processing.
- **Train-Test Split**:
- **80% Training** | **20% Testing**.
## ๐Ÿ‹๏ธ Fine-Tuning & Results
The model was fine-tuned using **Hugging Face Transformers** on a dataset of restaurant reviews.
### **๐Ÿ“Š Evaluation Metrics**
| Metric | Score |
|-------------|--------|
| **Eval Loss** | `****` |
| **Accuracy** | `88.71%` |
| **Precision** | `91.07%` |
| **Recall** | `93.31%` |
| **F1-score** | `92.17%` |
## โš™๏ธ Training Parameters
```python
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
num_train_epochs=5,
weight_decay=0.01,
learning_rate=3e-5,
fp16=True,
report_to="none"
)