ast-finetuned-audioset-10-10-0.4593_ft_ESC-50_aug_0-1
This model is a fine-tuned version of MIT/ast-finetuned-audioset-10-10-0.4593 on a subset of ashraq/esc50 dataset. It achieves the following results on the evaluation set:
- Loss: 0.7391
- Accuracy: 0.9286
- Precision: 0.9449
- Recall: 0.9286
- F1: 0.9244
Training and evaluation data
Training and evaluation data were augmented with audiomentations GitHub: iver56/audiomentations library and the following augmentation methods have been performed based on previous experiments Elliott et al.: Tiny transformers for audio classification at the edge:
Gain
- each audio sample is amplified/attenuated by a random factor between 0.5 and 1.5 with a 0.3 probability
Noise
- a random amount of Gaussian noise with a relative amplitude between 0.001 and 0.015 is added to each audio sample with a 0.5 probability
Speed adjust
- duration of each audio sample is extended by a random amount between 0.5 and 1.5 with a 0.3 probability
Pitch shift
- pitch of each audio sample is shifted by a random amount of semitones selected from the closed interval [-4,4] with a 0.3 probability
Time masking
- a random fraction of lenght of each audio sample in the range of (0,0.02] is erased with a 0.3 probability
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|---|---|---|
9.9002 | 1.0 | 28 | 8.5662 | 0.0 | 0.0 | 0.0 | 0.0 |
5.7235 | 2.0 | 56 | 4.3990 | 0.0357 | 0.0238 | 0.0357 | 0.0286 |
2.4076 | 3.0 | 84 | 2.2972 | 0.4643 | 0.7405 | 0.4643 | 0.4684 |
1.4448 | 4.0 | 112 | 1.3975 | 0.7143 | 0.7340 | 0.7143 | 0.6863 |
0.8373 | 5.0 | 140 | 1.0468 | 0.8571 | 0.8524 | 0.8571 | 0.8448 |
0.7239 | 6.0 | 168 | 0.8518 | 0.8929 | 0.9164 | 0.8929 | 0.8766 |
0.6504 | 7.0 | 196 | 0.7391 | 0.9286 | 0.9449 | 0.9286 | 0.9244 |
0.535 | 8.0 | 224 | 0.6682 | 0.9286 | 0.9449 | 0.9286 | 0.9244 |
0.4237 | 9.0 | 252 | 0.6443 | 0.9286 | 0.9449 | 0.9286 | 0.9244 |
0.3709 | 10.0 | 280 | 0.6304 | 0.9286 | 0.9449 | 0.9286 | 0.9244 |
Test results
Parameter | Value |
---|---|
test_loss | 0.5829914808273315 |
test_accuracy | 0.9285714285714286 |
test_precision | 0.9446428571428571 |
test_recall | 0.9285714285714286 |
test_f1 | 0.930292723149866 |
test_runtime (s) | 4.1488 |
test_samples_per_second | 6.749 |
test_steps_per_second | 3.374 |
epoch | 10.0 |
Framework versions
- Transformers 4.27.4
- Pytorch 2.0.0
- Datasets 2.10.1
- Tokenizers 0.13.2
- Downloads last month
- 162
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.