b2710936-8f82-4f80-8c58-c9da69a16314

This model is a fine-tuned version of echarlaix/tiny-random-mistral on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.000206
train_batch_size: 4
eval_batch_size: 4
seed: 60
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
training_steps: 500

Training Loss	Epoch	Step	Validation Loss
No log	0.0005	1	10.3792
20.6445	0.0268	50	10.3278
20.6159	0.0537	100	10.3176
20.593	0.0805	150	10.3142
20.5966	0.1074	200	10.3124
20.5907	0.1342	250	10.3120
20.5993	0.1610	300	10.3105
20.597	0.1879	350	10.3089
20.5849	0.2147	400	10.3079
20.5961	0.2415	450	10.3079
20.5829	0.2684	500	10.3074