File size: 5,248 Bytes
2f0b1cc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
base_model:
- arcee-ai/Virtuoso-Small-v2
- sometimesanotion/Base-Chocolatine-2-14B-Instruct-v2.0b3
- CultriX/Qwen2.5-14B-Hyperionv4
- sometimesanotion/Qwenvergence-14B-v12-Prose-DS
- sthenno-com/miscii-14b-1225
library_name: transformers
tags:
- mergekit
- merge

---
# merge

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

## Merge Details
### Merge Method

This model was merged using the [DARE TIES](https://arxiv.org/abs/2311.03099) merge method using [sometimesanotion/Base-Chocolatine-2-14B-Instruct-v2.0b3](https://huggingface.co/sometimesanotion/Base-Chocolatine-2-14B-Instruct-v2.0b3) as a base.

### Models Merged

The following models were included in the merge:
* [arcee-ai/Virtuoso-Small-v2](https://huggingface.co/arcee-ai/Virtuoso-Small-v2)
* [CultriX/Qwen2.5-14B-Hyperionv4](https://huggingface.co/CultriX/Qwen2.5-14B-Hyperionv4)
* [sometimesanotion/Qwenvergence-14B-v12-Prose-DS](https://huggingface.co/sometimesanotion/Qwenvergence-14B-v12-Prose-DS)
* [sthenno-com/miscii-14b-1225](https://huggingface.co/sthenno-com/miscii-14b-1225)

### Configuration

The following YAML configuration was used to produce this model:

```yaml
name:                Enhanced-TIES-Base-v1
# Defining the TIES-merged base model used in the SLERP merge above.
merge_method:        dare_ties
base_model:          sometimesanotion/Base-Chocolatine-2-14B-Instruct-v2.0b3 # Solid base model
tokenizer_source:    base # Base tokenizer
dtype:               bfloat16 # Efficient dtype
out_dtype:           bfloat16 # Output in bfloat16

parameters:
  normalize:         true # Normalize weights for TIES
  int8_mask:         true  # Int8 mask for TIES
  rescale:           false # No rescaling for TIES
  density:           0.75  # Density for TIES merge

models: # Models for the TIES base merge (same models and densities as Enhanced-LayeredSlerp-v1)
  - model:           arcee-ai/Virtuoso-Small-v2      # IFEval specialist - high density
    parameters:
      weight:        1.0
      density:       0.9
  - model:           sthenno-com/miscii-14b-1225   # BBH and Reasoning - medium density
    parameters:
      weight:        1.0
      density:       0.8
  - model:           sometimesanotion/Qwenvergence-14B-v12-Prose-DS # MATH and general Qwen - medium density
    parameters:
      weight:        1.0
      density:       0.8
  - model:           CultriX/Qwen2.5-14B-Hyperionv4 # General improvement - lower density
    parameters:
      weight:        1.0
      density:       0.6


# Commentary:
# =============================================================================
# SuperMerge-LayeredTIES-v1 Commentary:
#
# This configuration combines the strengths of both Enhanced-LayeredSlerp-v1 and SuperMerge-Enhanced-v1.
# It leverages the robust foundation of a TIES-merged base model (Enhanced-TIES-Base-v1) and applies
# the layer-wise module approach and fine-grained weight control from SuperMerge-Enhanced-v1 in a SLERP merge.
#
# Key Features:
#   - TIES-Merged Base Foundation:  Uses 'Enhanced-TIES-Base-v1' as the base model for the SLERP merge.
#     This TIES base provides a selectively merged and potentially more efficient starting point, incorporating
#     strengths from multiple models (Virtuoso, Phi-4, Qwenvergence, DeepSeek) with density control.
#
#   - Layer-wise Module Integration in SLERP:  Maintains the module-based slice structure from SuperMerge-Enhanced-v1.
#     The SLERP merge now combines the TIES-merged base with specialized modules for Reasoning, IFEval, and MATH/Knowledge
#     at different layer ranges, using explicit weights for fine-grained control.
#
#   - Benchmark-Driven Iterative Weight Tuning:  The configuration is designed to be optimized through a
#     benchmark-driven iterative weight tuning process (as described in the refined SuperMerge-Enhanced-v1 approach).
#     The initial weights provided are starting points and need to be systematically tuned based on benchmark results.
#
# Tuning Process (Same as Refined SuperMerge-Enhanced-v1):
#   1. Initial Benchmarking: Run a full benchmark suite.
#   2. Performance Analysis: Examine per-benchmark scores and compare to source models.
#   3. Targeted Weight Adjustments: Adjust layer weights based on performance analysis (e.g., increase IFEval module weight
#      in early layers if IFEval is weak).
#   4. Iterate: Repeat steps 1-3. Make small, incremental adjustments in each iteration.
#
# Rationale:
#   - By using a TIES-merged base, we aim to create a more robust and potentially efficient foundation for the SLERP merge.
#   - The layer-wise module approach and fine-grained weights in SLERP still allow for precise control over the blending
#     of specialized capabilities at different network depths, building upon the solid TIES base.
#   - The emphasis on a benchmark-driven iterative weight tuning process remains crucial for achieving optimal performance.
#
# Next Steps:
#   - Implement this configuration using MergeKit.
#   - Run initial benchmarks to establish a baseline.
#   - Begin the iterative benchmark-driven weight tuning process to optimize performance.
# =============================================================================
```