|
--- |
|
library_name: transformers |
|
base_model: mrm8488/electricidad-base-discriminator |
|
tags: |
|
- classification |
|
- generated_from_trainer |
|
metrics: |
|
- accuracy |
|
model-index: |
|
- name: clasificador-tweets |
|
results: [] |
|
--- |
|
|
|
|
|
# clasificador-tweets |
|
|
|
This model is a fine-tuned version of [mrm8488/electricidad-base-discriminator](https://huggingface.co/mrm8488/electricidad-base-discriminator) on an unknown dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.9373 |
|
- Accuracy: 0.7234 |
|
|
|
## Model description |
|
|
|
This model has been trained to classify tweets into 7 labor-related categories: |
|
|
|
- **Low salary** |
|
- **Labor rights** |
|
- **Labor exploitation** |
|
- **Workplace harassment** |
|
- **Abuse of authority** |
|
- **Workplace negligence** |
|
- **Job opportunities** |
|
|
|
The model was trained using the dataset "somosnlp-hackathon-2022/es_tweets_laboral," which contains Spanish tweets classified into the 7 mentioned categories. |
|
The dataset has the following characteristics: |
|
|
|
- **Training set**: 184 tweets. |
|
- **Test set**: 47 tweets. |
|
|
|
-Columns: |
|
|
|
text: The tweet's text. |
|
intent: The tweet's category. |
|
entities: Additional information about the entities identified in the tweets. |
|
|
|
The tokenizer from "mrm8488/electricidad-base-discriminator" was used for tokenization. |
|
|
|
|
|
|
|
## Intended uses & limitations |
|
|
|
Classification of tweets related to labor topics. |
|
The model's accuracy is approximately ~72%. |
|
It is designed to classify tweets in Spanish. |
|
The dataset is small (184 tweets for training), which may limit the model's generalization. |
|
|
|
|
|
|
|
## Training and evaluation data |
|
|
|
The model was trained for **10 epochs** using accuracy as the evaluation metric. The results on the test set were as follows: |
|
|
|
**Loss**: 0.937 |
|
**Accuracy**: 72.34% |
|
|
|
It should be noted that these results may vary across different runs due to the randomness inherent in model training. |
|
|
|
## Training procedure |
|
|
|
The training was based on the Transformers library by HuggingFace. |
|
|
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 5e-05 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
- lr_scheduler_type: linear |
|
- num_epochs: 10 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Accuracy | |
|
|:-------------:|:-----:|:----:|:---------------:|:--------:| |
|
| No log | 1.0 | 23 | 1.7598 | 0.3404 | |
|
| No log | 2.0 | 46 | 1.5505 | 0.5106 | |
|
| No log | 3.0 | 69 | 1.3208 | 0.6170 | |
|
| No log | 4.0 | 92 | 1.1691 | 0.6383 | |
|
| No log | 5.0 | 115 | 1.1357 | 0.6383 | |
|
| No log | 6.0 | 138 | 0.9936 | 0.7447 | |
|
| No log | 7.0 | 161 | 1.0371 | 0.6596 | |
|
| No log | 8.0 | 184 | 0.9330 | 0.7021 | |
|
| No log | 9.0 | 207 | 0.9195 | 0.7234 | |
|
| No log | 10.0 | 230 | 0.9373 | 0.7234 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.47.0 |
|
- Pytorch 2.5.1+cu121 |
|
- Datasets 3.2.0 |
|
- Tokenizers 0.21.0 |
|
|