YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Slightly modified mpt-30b, which has some updates to allow gradient checkpointing/etc., to be compatible with qlora training code.

Original model: https://huggingface.co/mosaicml/mpt-30b

My fork of qlora with mpt-30b support: https://github.com/jondurbin/qlora

Differences in the qlora scripts:

  • requires adding --mpt True for mpt-based models
  • uses --num_train_epochs instead of --max_steps
  • uses airoboros prompt format (mostly 1:1 with vicuna) rather than alpaca, and expects an input file in JSONL format with "instruction" and "response"

I think there's a bug in gradient accumulation, so if you try this, maybe set gradient accumulation steps to 1

my first attempts used batch size 6, with gradient accumulation steps 16, but results of three epochs with gradient accumulation vs without were quite a bit worse

5 epochs seemed to achieve the best results, but YMMV

Full example of tuning (used for airoboros-mpt-30b-gpt4-1.4):

source /workspace/venv/bin/activate
export PYTHONPATH=./mpt-30b
export WANDB_API_KEY=[redacted]
export WANDB_PROJECT=airoboros-mpt-30b-gpt4-1.4

python qlora.py \
    --model_name_or_path ./mpt-30b \
    --output_dir ./$WANDB_PROJECT-checkpoints \
    --num_train_epochs 5 \
    --logging_steps 1 \
    --save_strategy steps \
    --data_seed 11422 \
    --save_steps 100 \
    --save_total_limit 3 \
    --evaluation_strategy "no" \
    --eval_dataset_size 2 \
    --max_new_tokens 8192 \
    --dataloader_num_workers 3 \
    --logging_strategy steps \
    --remove_unused_columns False \
    --do_train \
    --lora_r 64 \
    --lora_alpha 16 \
    --lora_modules all \
    --double_quant \
    --quant_type nf4 \
    --bf16 \
    --bits 4 \
    --warmup_ratio 0.03 \
    --lr_scheduler_type constant \
    --dataset ./instructions.jsonl \
    --dataset_format airoboros \
    --model_max_len 8192 \
    --gradient_checkpointing \
    --per_device_train_batch_size 6 \
    --gradient_accumulation_steps 1 \
    --learning_rate 0.0001 \
    --adam_beta2 0.999 \
    --max_grad_norm 0.3 \
    --lora_dropout 0.05 \
    --weight_decay 0.0 \
    --seed 11422 \
    --trust_remote_code \
    --mpt True \
    --report_to wandb

Merged model

Run the merge_weights.py script in the qlora repo: https://github.com/jondurbin/qlora/blob/main/merge_weights.py

Then, copy all of the original python files from the mpt-30b repo into your output directory: https://huggingface.co/mosaicml/mpt-30b/tree/main

Downloads last month
18
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.