merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the DARE TIES merge method using CultriX/SeQwence-14Bv1 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: VAGOsolutions/SauerkrautLM-v2-14b-DPO
    parameters:
      weight: 0.25    # Prioritize top IFEval
      density: 0.6     # Keep a large portion for strong factual baseline

  - model: allknowingroger/QwenSlerp6-14B
    parameters:
      weight: 0.25    # High weight for MATH and balanced reasoning
      density: 0.6     # Retain robust reasoning capabilities

  - model: CultriX/SeQwence-14B-EvolMerge
    parameters:
      weight: 0.20    # Important for best BBH and near-top MuSR
      density: 0.5     # Moderate density to ensure these strengths blend well

  - model: CultriX/Qwen2.5-14B-Wernicke
    parameters:
      weight: 0.15    # Adds top GPQA performance
      density: 0.5     # Sufficient to preserve QA strengths

  - model: allknowingroger/QwenStock3-14B
    parameters:
      weight: 0.15    # For top MMLU-PRO, enhancing domain knowledge
      density: 0.5     # Balanced integration of diverse subject expertise

base_model: CultriX/SeQwence-14Bv1
merge_method: dare_ties
parameters:
  normalize: true      # Ensures parameter scaling compatibility
  int8_mask: true      # Memory and computational efficiency
dtype: bfloat16
adaptive_merge_parameters:
  task_weights:
    IFEval: 1.2        # Emphasize instruction-following and formatting adherence
    BBH: 1.2           # Maintain strong performance in challenging reasoning tasks
    MATH_Lvl_5: 1.3    # Ensure domain expertise in competitive math problems
    GPQA: 1.3          # Leverage graduate-level knowledge capabilities
    MuSR: 1.1          # Enhance multistep reasoning on complex tasks
    MMLU_PRO: 1.2      # Ensure robust multitask domain understanding
  smoothing_factor: 0.2  # Moderate blending for stable integration
gradient_clipping: 1.0   # Prevent over-contribution from any single model
Downloads last month
7
Safetensors
Model size
14.8B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for CultriX/Qwen2.5-14B-Emergedv2

Space using CultriX/Qwen2.5-14B-Emergedv2 1