Update README.md
Browse files
README.md
CHANGED
@@ -80,4 +80,142 @@ response
|
|
80 |
# No geral, os LLMs estão se tornando cada vez mais importantes à medida que a tecnologia continua a
|
81 |
# avançar. À medida que continuamos a usar LLMs em nossas vidas diárias, podemos esperar ver ainda
|
82 |
# mais desenvolvimentos interessantes no futuro.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
```
|
|
|
80 |
# No geral, os LLMs estão se tornando cada vez mais importantes à medida que a tecnologia continua a
|
81 |
# avançar. À medida que continuamos a usar LLMs em nossas vidas diárias, podemos esperar ver ainda
|
82 |
# mais desenvolvimentos interessantes no futuro.
|
83 |
+
```
|
84 |
+
|
85 |
+
```md
|
86 |
+
## Overall Results
|
87 |
+
|
88 |
+
| Task | Metric | Value | StdErr |
|
89 |
+
|---------------------------|---------------|---------|---------|
|
90 |
+
| ASSIN2 RTE | F1 Macro | 0.4486 | 0.0067 |
|
91 |
+
| ASSIN2 RTE | Accuracy | 0.5560 | 0.0071 |
|
92 |
+
| ASSIN2 STS | Pearson | 0.4091 | 0.0104 |
|
93 |
+
| ASSIN2 STS | MSE | 5.6395 | N/A |
|
94 |
+
| BluEX | Accuracy | 0.2503 | 0.0094 |
|
95 |
+
| ENEM Challenge | Accuracy | 0.3128 | 0.0071 |
|
96 |
+
| FAQUAD NLI | F1 Macro | 0.4611 | 0.0094 |
|
97 |
+
| FAQUAD NLI | Accuracy | 0.7877 | 0.0113 |
|
98 |
+
| HateBR Offensive (Binary) | F1 Macro | 0.3439 | 0.0049 |
|
99 |
+
| HateBR Offensive (Binary) | Accuracy | 0.4857 | 0.0095 |
|
100 |
+
| OAB Exams | Accuracy | 0.3062 | 0.0057 |
|
101 |
+
| Portuguese Hate Speech (Binary) | F1 Macro | 0.4119 | 0.0038 |
|
102 |
+
| Portuguese Hate Speech (Binary) | Accuracy | 0.7004 | 0.0111 |
|
103 |
+
| TweetSentBR | F1 Macro | 0.5055 | 0.0078 |
|
104 |
+
| TweetSentBR | Accuracy | 0.5697 | 0.0078 |
|
105 |
+
|
106 |
+
## Detailed Results by Task
|
107 |
+
|
108 |
+
### ASSIN2 RTE
|
109 |
+
|
110 |
+
| Metric | Value | StdErr |
|
111 |
+
|-------------|---------|---------|
|
112 |
+
| F1 Macro | 0.4486 | 0.0067 |
|
113 |
+
| Accuracy | 0.5560 | 0.0071 |
|
114 |
+
|
115 |
+
### ASSIN2 STS
|
116 |
+
|
117 |
+
| Metric | Value | StdErr |
|
118 |
+
|-------------|---------|---------|
|
119 |
+
| Pearson | 0.4091 | 0.0104 |
|
120 |
+
| MSE | 5.6395 | N/A |
|
121 |
+
|
122 |
+
### BluEX
|
123 |
+
|
124 |
+
| Exam ID | Metric | Value | StdErr |
|
125 |
+
|-------------------|----------|---------|---------|
|
126 |
+
| All | Accuracy | 0.2503 | 0.0094 |
|
127 |
+
| USP_2018 | Accuracy | 0.2037 | 0.0315 |
|
128 |
+
| UNICAMP_2018 | Accuracy | 0.1852 | 0.0306 |
|
129 |
+
| UNICAMP_2021_1 | Accuracy | 0.0870 | 0.0240 |
|
130 |
+
| USP_2020 | Accuracy | 0.2143 | 0.0317 |
|
131 |
+
| USP_2023 | Accuracy | 0.2045 | 0.0350 |
|
132 |
+
| UNICAMP_2019 | Accuracy | 0.2600 | 0.0358 |
|
133 |
+
| USP_2019 | Accuracy | 0.1500 | 0.0326 |
|
134 |
+
| UNICAMP_2020 | Accuracy | 0.2182 | 0.0321 |
|
135 |
+
| UNICAMP_2021_2 | Accuracy | 0.2941 | 0.0367 |
|
136 |
+
| UNICAMP_2023 | Accuracy | 0.4186 | 0.0433 |
|
137 |
+
| UNICAMP_2024 | Accuracy | 0.3111 | 0.0398 |
|
138 |
+
| USP_2024 | Accuracy | 0.2683 | 0.0398 |
|
139 |
+
| USP_2021 | Accuracy | 0.3269 | 0.0375 |
|
140 |
+
| UNICAMP_2022 | Accuracy | 0.3590 | 0.0444 |
|
141 |
+
| USP_2022 | Accuracy | 0.2857 | 0.0370 |
|
142 |
+
|
143 |
+
### ENEM Challenge
|
144 |
+
|
145 |
+
| Exam ID | Metric | Value | StdErr |
|
146 |
+
|-----------|----------|---------|---------|
|
147 |
+
| All | Accuracy | 0.3128 | 0.0071 |
|
148 |
+
| 2017 | Accuracy | 0.2845 | 0.0241 |
|
149 |
+
| 2016 | Accuracy | 0.2479 | 0.0226 |
|
150 |
+
| 2016_2 | Accuracy | 0.2846 | 0.0235 |
|
151 |
+
| 2022 | Accuracy | 0.3534 | 0.0240 |
|
152 |
+
| 2012 | Accuracy | 0.3362 | 0.0253 |
|
153 |
+
| 2011 | Accuracy | 0.3333 | 0.0251 |
|
154 |
+
| 2010 | Accuracy | 0.3846 | 0.0260 |
|
155 |
+
| 2014 | Accuracy | 0.3211 | 0.0259 |
|
156 |
+
| 2009 | Accuracy | 0.2696 | 0.0239 |
|
157 |
+
| 2015 | Accuracy | 0.2521 | 0.0229 |
|
158 |
+
| 2023 | Accuracy | 0.3481 | 0.0236 |
|
159 |
+
| 2013 | Accuracy | 0.3333 | 0.0261 |
|
160 |
+
|
161 |
+
### FAQUAD NLI
|
162 |
+
|
163 |
+
| Metric | Value | StdErr |
|
164 |
+
|-------------|---------|---------|
|
165 |
+
| F1 Macro | 0.4611 | 0.0094 |
|
166 |
+
| Accuracy | 0.7877 | 0.0113 |
|
167 |
+
|
168 |
+
### HateBR Offensive (Binary)
|
169 |
+
|
170 |
+
| Metric | Value | StdErr |
|
171 |
+
|-------------|---------|---------|
|
172 |
+
| F1 Macro | 0.3439 | 0.0049 |
|
173 |
+
| Accuracy | 0.4857 | 0.0095 |
|
174 |
+
|
175 |
+
### OAB Exams
|
176 |
+
|
177 |
+
| Exam ID | Metric | Value | StdErr |
|
178 |
+
|-------------|----------|---------|---------|
|
179 |
+
| All | Accuracy | 0.3062 | 0.0057 |
|
180 |
+
| 2011-05 | Accuracy | 0.3375 | 0.0304 |
|
181 |
+
| 2012-06a | Accuracy | 0.2625 | 0.0285 |
|
182 |
+
| 2010-02 | Accuracy | 0.3700 | 0.0279 |
|
183 |
+
| 2017-22 | Accuracy | 0.3500 | 0.0309 |
|
184 |
+
| 2016-20 | Accuracy | 0.3125 | 0.0300 |
|
185 |
+
| 2011-03 | Accuracy | 0.2626 | 0.0255 |
|
186 |
+
| 2015-17 | Accuracy | 0.3205 | 0.0304 |
|
187 |
+
| 2017-23 | Accuracy | 0.2875 | 0.0292 |
|
188 |
+
| 2018-25 | Accuracy | 0.3625 | 0.0311 |
|
189 |
+
| 2016-19 | Accuracy | 0.2436 | 0.0281 |
|
190 |
+
| 2017-24 | Accuracy | 0.1625 | 0.0238 |
|
191 |
+
| 2015-16 | Accuracy | 0.3125 | 0.0300 |
|
192 |
+
| 2011-04 | Accuracy | 0.3250 | 0.0301 |
|
193 |
+
| 2012-07 | Accuracy | 0.3500 | 0.0307 |
|
194 |
+
| 2012-06 | Accuracy | 0.1875 | 0.0253 |
|
195 |
+
| 2012-09 | Accuracy | 0.2468 | 0.0284 |
|
196 |
+
| 2013-12 | Accuracy | 0.3625 | 0.0311 |
|
197 |
+
| 2013-11 | Accuracy | 0.3000 | 0.0295 |
|
198 |
+
| 2010-01 | Accuracy | 0.3412 | 0.0296 |
|
199 |
+
| 2015-18 | Accuracy | 0.2875 | 0.0292 |
|
200 |
+
| 2014-13 | Accuracy | 0.3500 | 0.0308 |
|
201 |
+
| 2013-10 | Accuracy | 0.3125 | 0.0300 |
|
202 |
+
| 2016-20a | Accuracy | 0.2500 | 0.0279 |
|
203 |
+
| 2014-14 | Accuracy | 0.3125 | 0.0301 |
|
204 |
+
| 2012-08 | Accuracy | 0.3000 | 0.0296 |
|
205 |
+
| 2016-21 | Accuracy | 0.3375 | 0.0304 |
|
206 |
+
| 2014-15 | Accuracy | 0.4103 | 0.0321 |
|
207 |
+
|
208 |
+
### Portuguese Hate Speech (Binary)
|
209 |
+
|
210 |
+
| Metric | Value | StdErr |
|
211 |
+
|-------------|---------|---------|
|
212 |
+
| F1 Macro | 0.4119 | 0.0038 |
|
213 |
+
| Accuracy | 0.7004 | 0.0111 |
|
214 |
+
|
215 |
+
### TweetSentBR
|
216 |
+
|
217 |
+
| Metric | Value | StdErr |
|
218 |
+
|-------------|---------|---------|
|
219 |
+
| F1 Macro | 0.5055 | 0.0078 |
|
220 |
+
| Accuracy | 0.5697 | 0.0078 |
|
221 |
```
|