rebego commited on
Commit
c833d9e
verified
1 Parent(s): 8dd0965

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -12
README.md CHANGED
@@ -11,8 +11,6 @@ model-index:
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
  # clasificador-tweets
18
 
@@ -23,25 +21,55 @@ It achieves the following results on the evaluation set:
23
 
24
  ## Model description
25
 
26
- Este modelo ha sido entrenado para clasificar tweets en 7 categor铆as relacionadas con el 谩mbito laboral:
27
- - **Salario precario**
28
- - **Derechos laborales**
29
- - **Explotaci贸n laboral**
30
- - **Acoso laboral**
31
- - **Abuso de autoridad**
32
- - **Negligencia laboral**
33
- - **Oportunidad de empleo**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ## Intended uses & limitations
36
 
37
- More information needed
 
 
 
 
 
38
 
39
  ## Training and evaluation data
40
 
41
- More information needed
 
 
 
 
 
42
 
43
  ## Training procedure
44
 
 
 
 
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
 
11
  results: []
12
  ---
13
 
 
 
14
 
15
  # clasificador-tweets
16
 
 
21
 
22
  ## Model description
23
 
24
+ This model has been trained to classify tweets into 7 labor-related categories:
25
+
26
+ - **Low salary**
27
+ - **Labor rights**
28
+ - **Labor explotaition**
29
+ - **Workplace harasment**
30
+ - **Abuse of authority**
31
+ - **Workplace Negligence**
32
+ - **Job opportunities**
33
+
34
+ The model was trained using the dataset "somosnlp-hackathon-2022/es_tweets_laboral," which contains Spanish tweets classified into the 7 mentioned categories.
35
+ The dataset has the following characteristics:
36
+
37
+ - **Training set**: 184 tweets.
38
+ - **Test set**: 47 tweets.
39
+
40
+ -Columns:
41
+
42
+ text: The tweet's text.
43
+ intent: The tweet's category.
44
+ entities: Additional information about the entities identified in the tweets.
45
+
46
+ The tokenizer from "mrm8488/electricidad-base-discriminator" was used for tokenization.
47
+
48
+
49
 
50
  ## Intended uses & limitations
51
 
52
+ Classification of tweets related to labor topics.
53
+ The model's accuracy is approximately ~72%.
54
+ It is designed to classify tweets in Spanish.
55
+ The dataset is small (184 tweets for training), which may limit the model's generalization.
56
+
57
+
58
 
59
  ## Training and evaluation data
60
 
61
+ The model was trained for **10 epochs** using accuracy as the evaluation metric. The results on the test set were as follows:
62
+
63
+ **Loss**: 0.937
64
+ **Accuracy**: 72.34%
65
+
66
+ It should be noted that these results may vary across different runs due to the randomness inherent in model training.
67
 
68
  ## Training procedure
69
 
70
+ The training was based on the Transformers library by HuggingFace.
71
+
72
+
73
  ### Training hyperparameters
74
 
75
  The following hyperparameters were used during training: