Thestral v0.1

image/png

Thestral is Mistral Fine-tune. The model is a QLoRA version of mistralai/Mistral-7B-v0.1 on the Open-Orca/SlimOrca.

This model is finetuned using 1xH100 using axolotl.

See axolotl config

axolotl version: 0.4.0

base_model: mistralai/Mistral-7B-v0.1
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: Open-Orca/SlimOrca
    type: sharegpt
dataset_prepared_path: last_run_prepared
val_set_size: 0.1
output_dir: ./qlora-out_2

adapter: qlora
lora_model_dir:

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

lora_r: 128
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

wandb_project: slim_orca
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

GPT-4All Benchmark Set

Tasks Version Filter n-shot Metric Value Stderr
winogrande 1 none None acc 0.7498 ± 0.0122
piqa 1 none None acc 0.8172 ± 0.0090
none None acc_norm 0.8286 ± 0.0088
openbookqa 1 none None acc 0.3380 ± 0.0212
none None acc_norm 0.4420 ± 0.0222
hellaswag 1 none None acc 0.6254 ± 0.0048
none None acc_norm 0.8061 ± 0.0039
boolq 2 none None acc 0.8740 ± 0.0058
arc_easy 1 none None acc 0.8199 ± 0.0079
none None acc_norm 0.7891 ± 0.0084
arc_challenge 1 none None acc 0.5145 ± 0.0146
none None acc_norm 0.5461 ± 0.0145
Average: 71.93

🤖 Additional information about training

This model is fine-tuned for 1.0 epoch.

Loss graph

image/png


Thanks to axolotl for making the repository we used to make this model.

Built with Axolotl

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for NovusResearch/Thestral7b-v0.1

Finetuned
(833)
this model