TraceBack 12b Release

TraceBack is what I came up with when I thought, "how can we scale reasoning trace data generation effectively?"

Turn out you do not need to depend on just reasoning models (r1, o1, o3, etc) to create reasoning trace!

It has many goals in mind, but mainly:

enabling faster synthetic reasoning dataset generation, since we're using a small model here (smaller than r1, etc) so faster to do inference on, thus easier to scale
distill on synthetic traces for out of domain non-verifiable problems
converting any non-reasoning model output/datasets to a reasoning synthetic dataset when used as input

So far, current proof of concept managed to check the boxes for 1 and 3, and I plan on scaling this more as:

this only use Mistral Nemo 12b as base
Was only trained for 2 epochs
Only 200k samples were used for finetuning (Qlora), dataset at https://huggingface.co/datasets/secemp9/instruction_solution_thought

So there are still much room for improvement

This was trained using both instruction and solution as input, and the output being a plausible/possible/matching reasoning trace based on that.

I believe this is the future of reasoning data generation. Stay tuned for an eval release

Here some inference example, using chatgpt instruction + solution as input:

Inference Example

Here I use a simple example from chatgpt, passing both the instruction and the solution as input to the model:

Dataset Example

Here the format for the dataset follow instruction + solution: reasoning trace pairs Sample conversation:

{
  "messages": [
    {
      "role": "user",
      "content": "Instruction:
      text_here

      Solution:
      text_here
    },
    {
      "role": "assistant",
      "content": "text_here"
    }
  ]
}

which look like:

Prompt Format

For the prompt format, I was really trying to not overengineer, but I'm sure there is a better way to format this.

For now it's just: Instruction:

Solution:

the output of the model doesn't have (for now) any formatting, it's just reasoning as output

Code Example

Using transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the tokenizer and model
model_name = "secemp9/TraceBack-12b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Move the model to the desired device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)

# Define the messages
messages = [
    {"role": "user", "content": """Instruction:
how many r in strawberry


Solution:
There are **three** "r"s in "strawberry."
"""}
]

# Step 1: Apply chat template to get formatted text as a string
formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Step 2: Tokenize the formatted text into a dictionary of tensors
inputs = tokenizer(formatted_text, return_tensors="pt").to(device)

# Generate the response
outputs = model.generate(**inputs, max_new_tokens=32000)

# Decode and print the output
generated_text = tokenizer.decode(outputs[0])
print(generated_text)

unsloth

from unsloth import FastLanguageModel

# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained("secemp9/TraceBack-12b")

# Define the messages (replace "stuff_here" with your actual input)
messages = [
    {"role": "user", "content": """Instruction:
how many r in strawberry


Solution:
There are **three** "r"s in "strawberry."
"""}
]

# Step 1: Apply chat template to get formatted text as a string
formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Step 2: Tokenize the formatted text into a dictionary of tensors
inputs = tokenizer(formatted_text, return_tensors="pt").to(model.device)

# Generate the response
outputs = model.generate(**inputs, max_new_tokens=32000)

# Decode and print the output
generated_text = tokenizer.decode(outputs[0])
print(generated_text)

Axolotl config

For this, I basically tried to convert my unsloth code to an axolotl config file. I also used deepspeed. Configuration below:

config.yml

# Base model configuration
base_model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
load_in_4bit: true

# Dataset configuration
datasets:
  - path: instruction_solution_to_thought_dataset.jsonl
    type: chat_template

# Chat template
chat_template: chatml

# LoRA adapter configuration
adapter: lora
lora_r: 16
lora_alpha: 16
lora_dropout: 0
lora_target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj

# Training hyperparameters
max_seq_length: 128000
micro_batch_size: 2
gradient_accumulation_steps: 8
learning_rate: 3e-5
num_epochs: 3
warmup_steps: 100
optimizer: adamw_8bit
weight_decay: 0.01
lr_scheduler_type: cosine
max_grad_norm: 1.0
output_dir: ./outputs_solution_to_thought
seed: 3407
merge_lora: true
hf_upload: true
hf_repo: secemp9/TraceBack-12b
xformers_attention:
flash_attention: True
bf16: true          # Enable BF16 mixed precision
# Multi-GPU training with DeepSpeed
deepspeed: deepspeed_configs/zero2.json

# Optional: Enable gradient checkpointing
gradient_checkpointing: true

deepspeed_configs/zero2.json

{
  "zero_optimization": {
    "stage": 2,
    "allgather_partitions": true,
    "allgather_bucket_size": 2e8,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 2e8,
    "contiguous_gradients": true
  },
  "bf16": {
    "enabled": true
  },
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": "auto",
      "weight_decay": "auto",
      "betas": [0.9, 0.999],
      "eps": 1e-8
    }
  },
  "scheduler": {
    "type": "WarmupLR",
    "params": {
      "warmup_min_lr": 0,
      "warmup_max_lr": "auto",
      "warmup_num_steps": "auto"
    }
  },
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "steps_per_print": 10,
  "wandb": {
    "enabled": true
  }
}

secemp9
/

TraceBack-12b