Update README.md
Browse files
README.md
CHANGED
@@ -51,7 +51,7 @@ model-index:
|
|
51 |
num_few_shot: 4
|
52 |
metrics:
|
53 |
- type: exact_match
|
54 |
-
value:
|
55 |
name: exact match
|
56 |
source:
|
57 |
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Goekdeniz-Guelmez/Josiefied-Qwen2.5-14B-Instruct-abliterated-v4
|
@@ -227,40 +227,6 @@ Use at you rown risk!
|
|
227 |
|
228 |
---
|
229 |
|
230 |
-
|
231 |
-
# Qwen2.5-14B-Instruct
|
232 |
-
|
233 |
-
## Introduction
|
234 |
-
|
235 |
-
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
|
236 |
-
|
237 |
-
- Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains.
|
238 |
-
- Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots.
|
239 |
-
- **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
|
240 |
-
- **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
|
241 |
-
|
242 |
-
**This repo contains the instruction-tuned 14B Qwen2.5 model**, which has the following features:
|
243 |
-
- Type: Causal Language Models
|
244 |
-
- Training Stage: Pretraining & Post-training
|
245 |
-
- Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
|
246 |
-
- Number of Parameters: 14.7B
|
247 |
-
- Number of Paramaters (Non-Embedding): 13.1B
|
248 |
-
- Number of Layers: 48
|
249 |
-
- Number of Attention Heads (GQA): 40 for Q and 8 for KV
|
250 |
-
- Context Length: Full 131,072 tokens and generation 8192 tokens
|
251 |
-
- Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
|
252 |
-
|
253 |
-
For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
|
254 |
-
|
255 |
-
## Requirements
|
256 |
-
|
257 |
-
The code of Qwen2.5 has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`.
|
258 |
-
|
259 |
-
With `transformers<4.37.0`, you will encounter the following error:
|
260 |
-
```
|
261 |
-
KeyError: 'qwen2'
|
262 |
-
```
|
263 |
-
|
264 |
## Quickstart
|
265 |
|
266 |
Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
|
@@ -353,10 +319,10 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
|
|
353 |
|
354 |
| Metric |Value|
|
355 |
|-------------------|----:|
|
356 |
-
|Avg. |
|
357 |
|IFEval (0-Shot) |82.92|
|
358 |
|BBH (3-Shot) |48.05|
|
359 |
-
|MATH Lvl 5 (4-Shot)|
|
360 |
|GPQA (0-shot) |12.30|
|
361 |
|MuSR (0-shot) |13.15|
|
362 |
|MMLU-PRO (5-shot) |44.65|
|
|
|
51 |
num_few_shot: 4
|
52 |
metrics:
|
53 |
- type: exact_match
|
54 |
+
value: 54.23
|
55 |
name: exact match
|
56 |
source:
|
57 |
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Goekdeniz-Guelmez/Josiefied-Qwen2.5-14B-Instruct-abliterated-v4
|
|
|
227 |
|
228 |
---
|
229 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
230 |
## Quickstart
|
231 |
|
232 |
Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
|
|
|
319 |
|
320 |
| Metric |Value|
|
321 |
|-------------------|----:|
|
322 |
+
|Avg. |42.55|
|
323 |
|IFEval (0-Shot) |82.92|
|
324 |
|BBH (3-Shot) |48.05|
|
325 |
+
|MATH Lvl 5 (4-Shot)|54.23|
|
326 |
|GPQA (0-shot) |12.30|
|
327 |
|MuSR (0-shot) |13.15|
|
328 |
|MMLU-PRO (5-shot) |44.65|
|