Goekdeniz-Guelmez commited on
Commit
13eacd3
·
verified ·
1 Parent(s): bca034e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -37
README.md CHANGED
@@ -51,7 +51,7 @@ model-index:
51
  num_few_shot: 4
52
  metrics:
53
  - type: exact_match
54
- value: 0.0
55
  name: exact match
56
  source:
57
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Goekdeniz-Guelmez/Josiefied-Qwen2.5-14B-Instruct-abliterated-v4
@@ -227,40 +227,6 @@ Use at you rown risk!
227
 
228
  ---
229
 
230
-
231
- # Qwen2.5-14B-Instruct
232
-
233
- ## Introduction
234
-
235
- Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
236
-
237
- - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains.
238
- - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots.
239
- - **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
240
- - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
241
-
242
- **This repo contains the instruction-tuned 14B Qwen2.5 model**, which has the following features:
243
- - Type: Causal Language Models
244
- - Training Stage: Pretraining & Post-training
245
- - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
246
- - Number of Parameters: 14.7B
247
- - Number of Paramaters (Non-Embedding): 13.1B
248
- - Number of Layers: 48
249
- - Number of Attention Heads (GQA): 40 for Q and 8 for KV
250
- - Context Length: Full 131,072 tokens and generation 8192 tokens
251
- - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
252
-
253
- For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
254
-
255
- ## Requirements
256
-
257
- The code of Qwen2.5 has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`.
258
-
259
- With `transformers<4.37.0`, you will encounter the following error:
260
- ```
261
- KeyError: 'qwen2'
262
- ```
263
-
264
  ## Quickstart
265
 
266
  Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
@@ -353,10 +319,10 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
353
 
354
  | Metric |Value|
355
  |-------------------|----:|
356
- |Avg. |33.51|
357
  |IFEval (0-Shot) |82.92|
358
  |BBH (3-Shot) |48.05|
359
- |MATH Lvl 5 (4-Shot)| 0.00|
360
  |GPQA (0-shot) |12.30|
361
  |MuSR (0-shot) |13.15|
362
  |MMLU-PRO (5-shot) |44.65|
 
51
  num_few_shot: 4
52
  metrics:
53
  - type: exact_match
54
+ value: 54.23
55
  name: exact match
56
  source:
57
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Goekdeniz-Guelmez/Josiefied-Qwen2.5-14B-Instruct-abliterated-v4
 
227
 
228
  ---
229
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
230
  ## Quickstart
231
 
232
  Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
 
319
 
320
  | Metric |Value|
321
  |-------------------|----:|
322
+ |Avg. |42.55|
323
  |IFEval (0-Shot) |82.92|
324
  |BBH (3-Shot) |48.05|
325
+ |MATH Lvl 5 (4-Shot)|54.23|
326
  |GPQA (0-shot) |12.30|
327
  |MuSR (0-shot) |13.15|
328
  |MMLU-PRO (5-shot) |44.65|