Spaces:
Sleeping
Sleeping
using device: cuda | |
loaded 338025 tokens | |
1 epoch = 82 batches | |
Number of model parameters: 124439808 | |
Epoch 1/70: 100% 82/82 | |
Epoch 1/70, Loss: 6.169636 | |
Checkpoint saved to checkpoint.pt | |
Epoch 2/70: 100% 82/82 | |
Epoch 2/70, Loss: 5.720689 | |
Checkpoint saved to checkpoint.pt | |
Epoch 3/70: 100% 82/82 | |
Epoch 3/70, Loss: 5.390238 | |
Checkpoint saved to checkpoint.pt | |
Epoch 4/70: 100% 82/82 | |
Epoch 4/70, Loss: 5.164030 | |
Checkpoint saved to checkpoint.pt | |
Epoch 5/70: 100% 82/82 | |
Epoch 5/70, Loss: 5.051653 | |
Checkpoint saved to checkpoint.pt | |
Epoch 6/70: 100% 82/82 | |
Epoch 6/70, Loss: 4.947546 | |
Checkpoint saved to checkpoint.pt | |
Epoch 7/70: 100% 82/82 | |
Epoch 7/70, Loss: 4.893464 | |
Checkpoint saved to checkpoint.pt | |
Epoch 8/70: 100% 82/82 | |
Epoch 8/70, Loss: 4.785249 | |
Checkpoint saved to checkpoint.pt | |
Epoch 9/70: 100% 82/82 | |
Epoch 9/70, Loss: 4.773346 | |
Checkpoint saved to checkpoint.pt | |
Epoch 10/70: 100% 82/82 | |
Epoch 10/70, Loss: 4.669469 | |
Checkpoint saved to checkpoint.pt | |
Epoch 11/70: 100% 82/82 | |
Epoch 11/70, Loss: 4.617172 | |
Checkpoint saved to checkpoint.pt | |
Epoch 12/70: 100% 82/82 | |
Epoch 12/70, Loss: 4.594382 | |
Checkpoint saved to checkpoint.pt | |
Epoch 13/70: 100% 82/82 | |
Epoch 13/70, Loss: 4.554847 | |
Checkpoint saved to checkpoint.pt | |
Epoch 14/70: 100% 82/82 | |
Epoch 14/70, Loss: 4.506260 | |
Checkpoint saved to checkpoint.pt | |
Epoch 15/70: 100% 82/82 | |
Epoch 15/70, Loss: 4.416086 | |
Checkpoint saved to checkpoint.pt | |
Epoch 16/70: 100% 82/82 | |
Epoch 16/70, Loss: 4.370214 | |
Checkpoint saved to checkpoint.pt | |
Epoch 17/70: 100% 82/82 | |
Epoch 17/70, Loss: 4.278370 | |
Checkpoint saved to checkpoint.pt | |
Epoch 18/70: 100% 82/82 | |
Epoch 18/70, Loss: 4.304771 | |
Checkpoint saved to checkpoint.pt | |
Epoch 19/70: 100% 82/82 | |
Epoch 19/70, Loss: 4.209321 | |
Checkpoint saved to checkpoint.pt | |
Epoch 20/70: 100% 82/82 | |
Epoch 20/70, Loss: 4.175936 | |
Checkpoint saved to checkpoint.pt | |
Epoch 21/70: 100% 82/82 | |
Epoch 21/70, Loss: 4.071361 | |
Checkpoint saved to checkpoint.pt | |
Epoch 22/70: 100% 82/82 | |
Epoch 22/70, Loss: 4.071530 | |
Checkpoint saved to checkpoint.pt | |
Epoch 23/70: 100% 82/82 | |
Epoch 23/70, Loss: 4.053171 | |
Checkpoint saved to checkpoint.pt | |
Epoch 24/70: 100% 82/82 | |
Epoch 24/70, Loss: 3.923664 | |
Checkpoint saved to checkpoint.pt | |
Epoch 25/70: 100% 82/82 | |
Epoch 25/70, Loss: 3.827437 | |
Checkpoint saved to checkpoint.pt | |
Epoch 26/70: 100% 82/82 | |
Epoch 26/70, Loss: 3.767063 | |
Checkpoint saved to checkpoint.pt | |
Epoch 27/70: 100% 82/82 | |
Epoch 27/70, Loss: 3.711340 | |
Checkpoint saved to checkpoint.pt | |
Epoch 28/70: 100% 82/82 | |
Epoch 28/70, Loss: 3.622302 | |
Checkpoint saved to checkpoint.pt | |
Epoch 29/70: 100% 82/82 | |
Epoch 29/70, Loss: 3.583114 | |
Checkpoint saved to checkpoint.pt | |
Epoch 30/70: 100% 82/82 | |
Epoch 30/70, Loss: 3.517573 | |
Checkpoint saved to checkpoint.pt | |
Epoch 31/70: 100% 82/82 | |
Epoch 31/70, Loss: 3.445611 | |
Checkpoint saved to checkpoint.pt | |
Epoch 32/70: 100% 82/82 | |
Epoch 32/70, Loss: 3.410571 | |
Checkpoint saved to checkpoint.pt | |
Epoch 33/70: 100% 82/82 | |
Epoch 33/70, Loss: 3.282128 | |
Checkpoint saved to checkpoint.pt | |
Epoch 34/70: 100% 82/82 | |
Epoch 34/70, Loss: 3.307455 | |
Checkpoint saved to checkpoint.pt | |
Epoch 35/70: 100% 82/82 | |
Epoch 35/70, Loss: 3.126928 | |
Checkpoint saved to checkpoint.pt | |
Epoch 36/70: 100% 82/82 | |
Epoch 36/70, Loss: 3.057953 | |
Checkpoint saved to checkpoint.pt | |
Epoch 37/70: 100% 82/82 | |
Epoch 37/70, Loss: 3.082567 | |
Checkpoint saved to checkpoint.pt | |
Epoch 38/70: 100% 82/82 | |
Epoch 38/70, Loss: 3.066772 | |
Checkpoint saved to checkpoint.pt | |
Epoch 39/70: 100% 82/82 | |
Epoch 39/70, Loss: 2.943954 | |
Checkpoint saved to checkpoint.pt | |
Epoch 40/70: 100% 82/82 | |
Epoch 40/70, Loss: 2.874876 | |
Checkpoint saved to checkpoint.pt | |
Epoch 41/70: 100% 82/82 | |
Epoch 41/70, Loss: 2.781206 | |
Checkpoint saved to checkpoint.pt | |
Epoch 42/70: 100% 82/82 | |
Epoch 42/70, Loss: 2.729423 | |
Checkpoint saved to checkpoint.pt | |
Epoch 43/70: 100% 82/82 | |
Epoch 43/70, Loss: 2.656427 | |
Checkpoint saved to checkpoint.pt | |
Epoch 44/70: 100% 82/82 | |
Epoch 44/70, Loss: 2.641519 | |
Checkpoint saved to checkpoint.pt | |
Epoch 45/70: 100% 82/82 | |
Epoch 45/70, Loss: 2.593380 | |
Checkpoint saved to checkpoint.pt | |
Epoch 46/70: 100% 82/82 | |
Epoch 46/70, Loss: 2.504074 | |
Checkpoint saved to checkpoint.pt | |
Epoch 47/70: 100% 82/82 | |
Epoch 47/70, Loss: 2.510426 | |
Checkpoint saved to checkpoint.pt | |
Epoch 48/70: 100% 82/82 | |
Epoch 48/70, Loss: 2.465840 | |
Checkpoint saved to checkpoint.pt | |
Epoch 49/70: 100% 82/82 | |
Epoch 49/70, Loss: 2.339541 | |
Checkpoint saved to checkpoint.pt | |
Epoch 50/70: 100% 82/82 | |
Epoch 50/70, Loss: 2.288784 | |
Checkpoint saved to checkpoint.pt | |
Epoch 51/70: 100% 82/82 | |
Epoch 51/70, Loss: 2.272939 | |
Checkpoint saved to checkpoint.pt | |
Epoch 52/70: 100% 82/82 | |
Epoch 52/70, Loss: 2.150897 | |
Checkpoint saved to checkpoint.pt | |
Epoch 53/70: 100% 82/82 | |
Epoch 53/70, Loss: 2.096288 | |
Checkpoint saved to checkpoint.pt | |
Epoch 54/70: 100% 82/82 | |
Epoch 54/70, Loss: 2.057416 | |
Checkpoint saved to checkpoint.pt | |
Epoch 55/70: 100% 82/82 | |
Epoch 55/70, Loss: 1.962530 | |
Checkpoint saved to checkpoint.pt | |
Epoch 56/70: 100% 82/82 | |
Epoch 56/70, Loss: 1.930993 | |
Checkpoint saved to checkpoint.pt | |
Epoch 57/70: 100% 82/82 | |
Epoch 57/70, Loss: 1.854412 | |
Checkpoint saved to checkpoint.pt | |
Epoch 58/70: 100% 82/82 | |
Epoch 58/70, Loss: 1.818957 | |
Checkpoint saved to checkpoint.pt | |
Epoch 59/70: 100% 82/82 | |
Epoch 59/70, Loss: 1.764919 | |
Checkpoint saved to checkpoint.pt | |
Epoch 60/70: 100% 82/82 | |
Epoch 60/70, Loss: 1.741000 | |
Checkpoint saved to checkpoint.pt | |
Epoch 61/70: 100% 82/82 | |
Epoch 61/70, Loss: 1.694582 | |
Checkpoint saved to checkpoint.pt | |
Epoch 62/70: 100% 82/82 | |
Epoch 62/70, Loss: 1.751990 | |
Checkpoint saved to checkpoint.pt | |
Epoch 63/70: 100% 82/82 | |
Epoch 63/70, Loss: 1.664971 | |
Checkpoint saved to checkpoint.pt | |
Epoch 64/70: 100% 82/82 | |
Epoch 64/70, Loss: 1.557876 | |
Checkpoint saved to checkpoint.pt | |
Epoch 65/70: 100% 82/82 | |
Epoch 65/70, Loss: 1.543549 | |
Checkpoint saved to checkpoint.pt | |
Epoch 66/70: 100% 82/82 | |
Epoch 66/70, Loss: 1.436256 | |
Checkpoint saved to checkpoint.pt | |
Epoch 67/70: 100% 82/82 | |
Epoch 67/70, Loss: 1.352293 | |
Checkpoint saved to checkpoint.pt | |
Epoch 68/70: 100% 82/82 | |
Epoch 68/70, Loss: 1.361581 | |
Checkpoint saved to checkpoint.pt | |
Epoch 69/70: 100% 82/82 | |
Epoch 69/70, Loss: 1.308131 | |
Checkpoint saved to checkpoint.pt | |
Epoch 70/70: 100% 82/82 | |
Epoch 70/70, Loss: 1.287876 | |
Checkpoint saved to checkpoint.pt | |
Total training time: 127 minutes and 37 seconds | |
Model saved to trained_model_quantized.pt with quantization and compression. | |
================================================== | |
Increased epoch to 91 to reach loss < 0.099999 | |
================================================== | |
using device: cuda | |
loaded 338025 tokens | |
1 epoch = 82 batches | |
Number of model parameters: 124439808 | |
Loading checkpoint from checkpoint.pt | |
/content/erav3-s12-transformer-model/erav3-s12-transformer-model/transformer.py:262: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. | |
checkpoint = torch.load(checkpoint_file) | |
Epoch 71/91: 100% 82/82 | |
Epoch 71/91, Loss: 1.453567 | |
Checkpoint saved to checkpoint.pt | |
Epoch 72/91: 100% 82/82 | |
Epoch 72/91, Loss: 1.162141 | |
Checkpoint saved to checkpoint.pt | |
Epoch 73/91: 100% 82/82 | |
Epoch 73/91, Loss: 1.174683 | |
Checkpoint saved to checkpoint.pt | |
Epoch 74/91: 100% 82/82 | |
Epoch 74/91, Loss: 1.089287 | |
Checkpoint saved to checkpoint.pt | |
Epoch 75/91: 100% 82/82 | |
Epoch 75/91, Loss: 1.010704 | |
Checkpoint saved to checkpoint.pt | |
Epoch 76/91: 100% 82/82 | |
Epoch 76/91, Loss: 0.979691 | |
Checkpoint saved to checkpoint.pt | |
Epoch 77/91: 100% 82/82 | |
Epoch 77/91, Loss: 0.918769 | |
Checkpoint saved to checkpoint.pt | |
Epoch 78/91: 100% 82/82 | |
Epoch 78/91, Loss: 0.904894 | |
Checkpoint saved to checkpoint.pt | |
Epoch 79/91: 100% 82/82 | |
Epoch 79/91, Loss: 0.851253 | |
Checkpoint saved to checkpoint.pt | |
Epoch 80/91: 100% 82/82 | |
Epoch 80/91, Loss: 0.810432 | |
Checkpoint saved to checkpoint.pt | |
Epoch 81/91: 100% 82/82 | |
Epoch 81/91, Loss: 0.730137 | |
Checkpoint saved to checkpoint.pt | |
Epoch 82/91: 100% 82/82 | |
Epoch 82/91, Loss: 0.677209 | |
Checkpoint saved to checkpoint.pt | |
Epoch 83/91: 100% 82/82 | |
Epoch 83/91, Loss: 0.618384 | |
Checkpoint saved to checkpoint.pt | |
Epoch 84/91: 100% 82/82 | |
Epoch 84/91, Loss: 0.570543 | |
Checkpoint saved to checkpoint.pt | |
Epoch 85/91: 100% 82/82 | |
Epoch 85/91, Loss: 0.516322 | |
Checkpoint saved to checkpoint.pt | |
Epoch 86/91: 100% 82/82 | |
Epoch 86/91, Loss: 0.432109 | |
Checkpoint saved to checkpoint.pt | |
Epoch 87/91: 100% 82/82 | |
Epoch 87/91, Loss: 0.320471 | |
Checkpoint saved to checkpoint.pt | |
Epoch 88/91: 100% 82/82 | |
Epoch 88/91, Loss: 0.271299 | |
Checkpoint saved to checkpoint.pt | |
Epoch 89/91: 100% 82/82 | |
Epoch 89/91, Loss: 0.218522 | |
Checkpoint saved to checkpoint.pt | |
Epoch 90/91: 100% 82/82 | |
Epoch 90/91, Loss: 0.121739 | |
Checkpoint saved to checkpoint.pt | |
Epoch 91/91: 100% 82/82 | |
Epoch 91/91, Loss: 0.089421 | |
Checkpoint saved to checkpoint.pt | |
Total training time: 34 minutes and 28 seconds | |
Model saved to trained_model_quantized.pt with quantization and compression. | |