simple-transformer / training.log
MilindChawre's picture
Adding changes in README
ac5a860
using device: cuda
loaded 338025 tokens
1 epoch = 82 batches
Number of model parameters: 124439808
Epoch 1/70: 100% 82/82 [01:38<00:00, 1.20s/it]
Epoch 1/70, Loss: 6.169636
Checkpoint saved to checkpoint.pt
Epoch 2/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 2/70, Loss: 5.720689
Checkpoint saved to checkpoint.pt
Epoch 3/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 3/70, Loss: 5.390238
Checkpoint saved to checkpoint.pt
Epoch 4/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 4/70, Loss: 5.164030
Checkpoint saved to checkpoint.pt
Epoch 5/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 5/70, Loss: 5.051653
Checkpoint saved to checkpoint.pt
Epoch 6/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 6/70, Loss: 4.947546
Checkpoint saved to checkpoint.pt
Epoch 7/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 7/70, Loss: 4.893464
Checkpoint saved to checkpoint.pt
Epoch 8/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 8/70, Loss: 4.785249
Checkpoint saved to checkpoint.pt
Epoch 9/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 9/70, Loss: 4.773346
Checkpoint saved to checkpoint.pt
Epoch 10/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 10/70, Loss: 4.669469
Checkpoint saved to checkpoint.pt
Epoch 11/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 11/70, Loss: 4.617172
Checkpoint saved to checkpoint.pt
Epoch 12/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 12/70, Loss: 4.594382
Checkpoint saved to checkpoint.pt
Epoch 13/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 13/70, Loss: 4.554847
Checkpoint saved to checkpoint.pt
Epoch 14/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 14/70, Loss: 4.506260
Checkpoint saved to checkpoint.pt
Epoch 15/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 15/70, Loss: 4.416086
Checkpoint saved to checkpoint.pt
Epoch 16/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 16/70, Loss: 4.370214
Checkpoint saved to checkpoint.pt
Epoch 17/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 17/70, Loss: 4.278370
Checkpoint saved to checkpoint.pt
Epoch 18/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 18/70, Loss: 4.304771
Checkpoint saved to checkpoint.pt
Epoch 19/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 19/70, Loss: 4.209321
Checkpoint saved to checkpoint.pt
Epoch 20/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 20/70, Loss: 4.175936
Checkpoint saved to checkpoint.pt
Epoch 21/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 21/70, Loss: 4.071361
Checkpoint saved to checkpoint.pt
Epoch 22/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 22/70, Loss: 4.071530
Checkpoint saved to checkpoint.pt
Epoch 23/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 23/70, Loss: 4.053171
Checkpoint saved to checkpoint.pt
Epoch 24/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 24/70, Loss: 3.923664
Checkpoint saved to checkpoint.pt
Epoch 25/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 25/70, Loss: 3.827437
Checkpoint saved to checkpoint.pt
Epoch 26/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 26/70, Loss: 3.767063
Checkpoint saved to checkpoint.pt
Epoch 27/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 27/70, Loss: 3.711340
Checkpoint saved to checkpoint.pt
Epoch 28/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 28/70, Loss: 3.622302
Checkpoint saved to checkpoint.pt
Epoch 29/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 29/70, Loss: 3.583114
Checkpoint saved to checkpoint.pt
Epoch 30/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 30/70, Loss: 3.517573
Checkpoint saved to checkpoint.pt
Epoch 31/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 31/70, Loss: 3.445611
Checkpoint saved to checkpoint.pt
Epoch 32/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 32/70, Loss: 3.410571
Checkpoint saved to checkpoint.pt
Epoch 33/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 33/70, Loss: 3.282128
Checkpoint saved to checkpoint.pt
Epoch 34/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 34/70, Loss: 3.307455
Checkpoint saved to checkpoint.pt
Epoch 35/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 35/70, Loss: 3.126928
Checkpoint saved to checkpoint.pt
Epoch 36/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 36/70, Loss: 3.057953
Checkpoint saved to checkpoint.pt
Epoch 37/70: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 37/70, Loss: 3.082567
Checkpoint saved to checkpoint.pt
Epoch 38/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 38/70, Loss: 3.066772
Checkpoint saved to checkpoint.pt
Epoch 39/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 39/70, Loss: 2.943954
Checkpoint saved to checkpoint.pt
Epoch 40/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 40/70, Loss: 2.874876
Checkpoint saved to checkpoint.pt
Epoch 41/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 41/70, Loss: 2.781206
Checkpoint saved to checkpoint.pt
Epoch 42/70: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 42/70, Loss: 2.729423
Checkpoint saved to checkpoint.pt
Epoch 43/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 43/70, Loss: 2.656427
Checkpoint saved to checkpoint.pt
Epoch 44/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 44/70, Loss: 2.641519
Checkpoint saved to checkpoint.pt
Epoch 45/70: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 45/70, Loss: 2.593380
Checkpoint saved to checkpoint.pt
Epoch 46/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 46/70, Loss: 2.504074
Checkpoint saved to checkpoint.pt
Epoch 47/70: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 47/70, Loss: 2.510426
Checkpoint saved to checkpoint.pt
Epoch 48/70: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 48/70, Loss: 2.465840
Checkpoint saved to checkpoint.pt
Epoch 49/70: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 49/70, Loss: 2.339541
Checkpoint saved to checkpoint.pt
Epoch 50/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 50/70, Loss: 2.288784
Checkpoint saved to checkpoint.pt
Epoch 51/70: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 51/70, Loss: 2.272939
Checkpoint saved to checkpoint.pt
Epoch 52/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 52/70, Loss: 2.150897
Checkpoint saved to checkpoint.pt
Epoch 53/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 53/70, Loss: 2.096288
Checkpoint saved to checkpoint.pt
Epoch 54/70: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 54/70, Loss: 2.057416
Checkpoint saved to checkpoint.pt
Epoch 55/70: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 55/70, Loss: 1.962530
Checkpoint saved to checkpoint.pt
Epoch 56/70: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 56/70, Loss: 1.930993
Checkpoint saved to checkpoint.pt
Epoch 57/70: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 57/70, Loss: 1.854412
Checkpoint saved to checkpoint.pt
Epoch 58/70: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 58/70, Loss: 1.818957
Checkpoint saved to checkpoint.pt
Epoch 59/70: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 59/70, Loss: 1.764919
Checkpoint saved to checkpoint.pt
Epoch 60/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 60/70, Loss: 1.741000
Checkpoint saved to checkpoint.pt
Epoch 61/70: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 61/70, Loss: 1.694582
Checkpoint saved to checkpoint.pt
Epoch 62/70: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 62/70, Loss: 1.751990
Checkpoint saved to checkpoint.pt
Epoch 63/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 63/70, Loss: 1.664971
Checkpoint saved to checkpoint.pt
Epoch 64/70: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 64/70, Loss: 1.557876
Checkpoint saved to checkpoint.pt
Epoch 65/70: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 65/70, Loss: 1.543549
Checkpoint saved to checkpoint.pt
Epoch 66/70: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 66/70, Loss: 1.436256
Checkpoint saved to checkpoint.pt
Epoch 67/70: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 67/70, Loss: 1.352293
Checkpoint saved to checkpoint.pt
Epoch 68/70: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 68/70, Loss: 1.361581
Checkpoint saved to checkpoint.pt
Epoch 69/70: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 69/70, Loss: 1.308131
Checkpoint saved to checkpoint.pt
Epoch 70/70: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 70/70, Loss: 1.287876
Checkpoint saved to checkpoint.pt
Total training time: 127 minutes and 37 seconds
Model saved to trained_model_quantized.pt with quantization and compression.
==================================================
Increased epoch to 91 to reach loss < 0.099999
==================================================
using device: cuda
loaded 338025 tokens
1 epoch = 82 batches
Number of model parameters: 124439808
Loading checkpoint from checkpoint.pt
/content/erav3-s12-transformer-model/erav3-s12-transformer-model/transformer.py:262: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(checkpoint_file)
Epoch 71/91: 100% 82/82 [01:36<00:00, 1.18s/it]
Epoch 71/91, Loss: 1.453567
Checkpoint saved to checkpoint.pt
Epoch 72/91: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 72/91, Loss: 1.162141
Checkpoint saved to checkpoint.pt
Epoch 73/91: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 73/91, Loss: 1.174683
Checkpoint saved to checkpoint.pt
Epoch 74/91: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 74/91, Loss: 1.089287
Checkpoint saved to checkpoint.pt
Epoch 75/91: 100% 82/82 [01:42<00:00, 1.25s/it]
Epoch 75/91, Loss: 1.010704
Checkpoint saved to checkpoint.pt
Epoch 76/91: 100% 82/82 [01:42<00:00, 1.24s/it]
Epoch 76/91, Loss: 0.979691
Checkpoint saved to checkpoint.pt
Epoch 77/91: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 77/91, Loss: 0.918769
Checkpoint saved to checkpoint.pt
Epoch 78/91: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 78/91, Loss: 0.904894
Checkpoint saved to checkpoint.pt
Epoch 79/91: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 79/91, Loss: 0.851253
Checkpoint saved to checkpoint.pt
Epoch 80/91: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 80/91, Loss: 0.810432
Checkpoint saved to checkpoint.pt
Epoch 81/91: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 81/91, Loss: 0.730137
Checkpoint saved to checkpoint.pt
Epoch 82/91: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 82/91, Loss: 0.677209
Checkpoint saved to checkpoint.pt
Epoch 83/91: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 83/91, Loss: 0.618384
Checkpoint saved to checkpoint.pt
Epoch 84/91: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 84/91, Loss: 0.570543
Checkpoint saved to checkpoint.pt
Epoch 85/91: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 85/91, Loss: 0.516322
Checkpoint saved to checkpoint.pt
Epoch 86/91: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 86/91, Loss: 0.432109
Checkpoint saved to checkpoint.pt
Epoch 87/91: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 87/91, Loss: 0.320471
Checkpoint saved to checkpoint.pt
Epoch 88/91: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 88/91, Loss: 0.271299
Checkpoint saved to checkpoint.pt
Epoch 89/91: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 89/91, Loss: 0.218522
Checkpoint saved to checkpoint.pt
Epoch 90/91: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 90/91, Loss: 0.121739
Checkpoint saved to checkpoint.pt
Epoch 91/91: 100% 82/82 [01:41<00:00, 1.24s/it]
Epoch 91/91, Loss: 0.089421
Checkpoint saved to checkpoint.pt
Total training time: 34 minutes and 28 seconds
Model saved to trained_model_quantized.pt with quantization and compression.