File size: 5,396 Bytes
a1d7289
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca6e075
 
 
 
a1d7289
 
 
 
 
 
 
 
9f7c22e
55ca955
a1d7289
55ca955
80a0b92
a1d7289
 
 
a62e6ff
 
 
 
 
55ca955
9a04195
 
a1d7289
 
80a0b92
 
 
 
 
a1d7289
 
71b2ad8
a1d7289
 
 
 
 
a6f99c7
a1d7289
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80a0b92
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
base_model:
- meta-llama/Meta-Llama-3-8B-Instruct
library_name: transformers
tags:
- mergekit
- prune
- dpo
- instruct
datasets:
- mlabonne/orpo-dpo-mix-40k
license: llama3
pipeline_tag: text-generation

model-index:
  - name: llama3-5.4b-instruct
    results:
      - task:
          type: text-generation
        dataset:
          name: truthfulqa_mc2
          type: truthfulqa_mc2
        metrics:
          - name: TruthfulQA (0-Shot)
            type: TruthfulQA (0-Shot)
            value: 0.517686926475562
      - task:
          type: text-generation
        dataset:
          name: ai2_arc
          type: ai2_arc
        metrics:
          - name: AI2 Reasoning Challenge (25-Shot)
            type: AI2 Reasoning Challenge (25-Shot)
            value: 0.360068259385666
      - task:
          type: text-generation
        dataset:
          name: hellaswag
          type: hellaswag
        metrics:
          - name: HellaSwag (10-Shot)
            type: HellaSwag (10-Shot)
            value: 0.503485361481777
      - task:
          type: text-generation
        dataset:
          name: winogrande
          type: winogrande
        metrics:
          - name: Winogrande (5-Shot)
            type: Winogrande (5-Shot)
            value: 0.633780584056827
      - task:
          type: text-generation
        dataset:
          name: mmlu
          type: mmlu
        metrics:
          - name: MMLU (5-Shot)
            type: MMLU (5-Shot)
            value: 0.290912975359635
---
# GGUFs

Quantized versions of this model are available:
- https://huggingface.co/HaileyStorm/llama3-5.4b-instruct-Q8_0-GGUF
- https://huggingface.co/HaileyStorm/llama3-5.4b-instruct-Q6_K-GGUF
- https://huggingface.co/HaileyStorm/llama3-5.4b-instruct-Q5_K_M-GGUF
- https://huggingface.co/HaileyStorm/llama3-5.4b-instruct-Q4_0-GGUF

# Pruned & Tuned

This is a "merge" of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
It is a prune of Meta-Llama-3-8B-Instruct from 32 layers down to 20, or about 5.4B parameter -- it's about 67% the size of the original.
Mostly, this is a test of (significant) pruning & healing an instruct-tuned model.

## Healing / Finetune
I healed the model by doing a full weight DPO finetune for 139k samples (3.15 epochs), and then a LoRA with r=128 a=256 for 73k samples (1.67 epochs). Both had 8k sequence length.

Prior to healing, the model returned absolute gibberish to any prompt, rarely two real words together. For example, give "2+2=" it might return "Mahmisan Pannpyout Na RMITa CMI TTi GP BP GP RSi TBi DD PS..."

The results are pretty good! The model has issues, but could have legitimate uses. It can carry on a conversation. It's certainly usable, if not useful.

Truthfulness and commonsense reasoning suffered the least from the prune / were healed the best. Knowledge and complex reasoning suffered the most.
This model has 67% the parameters of the original, and has:
- ~100% the TruthfulQA score of the original
- ~60% the ARC Challenge score
- ~65% the Hellaswag score
- ~85% the Winogrande score
- ~45% the the MMLU score

An average of 69% the benchmark scores for 67% the parameters, not bad! (Note, I had issues running the GSM8K and BBH benchmarks.)
I do believe it could be much better, by doing the pruning in stages (say, 4 layers at a time) with some healing in between, and longer healing at the end with a more diverse dataset.

### Benchmarks
![Comparative Benchmarks](benchmarks.png)
*Figure 1: Benchmark results for the pruned model, the original 8B model, and other models of similar size. Truthfulness and commonsense reasoning suffered the least from the prune / were healed the best. Knowledge and complex reasoning suffered the most.*

![Model Size vs Performance](relative.png)
*Figure 2: Model size vs average benchmark performance. Llama3-5.4b-instruct may not be fully healed, but its performance scales linearly with its size.*

## Why 5.4B?

This size should allow for:
- bf16 inference on 24GB VRAM
- Q8 or Q6 inference on 6GB VRAM
- Q5 inference on 4GB VRAM
- Fine-tuning on ... well, with less VRAM than an 8B model

And of course, as stated, it was a test of significant pruning, and of pruning&healing an instruct-tuned model. As a test, I think it's definitely successful.

## Mergekit Details
### Merge Method

This model was merged using the passthrough merge method.

### Models Merged

The following models were included in the merge:
* [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)

### Configuration

The following YAML configuration was used to produce this model:

```yaml
dtype: bfloat16
merge_method: passthrough
slices:
- sources:
  - layer_range: [0, 16]
    model: meta-llama/Meta-Llama-3-8B-Instruct
- sources:
  - layer_range: [20, 21]
    model: meta-llama/Meta-Llama-3-8B-Instruct
- sources:
  - layer_range: [29, 32]
    model: meta-llama/Meta-Llama-3-8B-Instruct
```

## Weights & Biases Logs
Here are the logs for the full weight fine tune:
- https://wandb.ai/haileycollet/llama3-5b/runs/ryyqhc97
- https://wandb.ai/haileycollet/llama3-5b/runs/fpj2sct3
- https://wandb.ai/haileycollet/llama3-5b/runs/k9z6n9em
- https://wandb.ai/haileycollet/llama3-5b/runs/r3xqyhm2

And the LoRA logs:
- https://wandb.ai/haileycollet/llama3-5b/runs/rseithn1
- https://wandb.ai/haileycollet/llama3-5b/runs/g26232ei