Spaces:
Sleeping
Sleeping
Commit
·
ac5a860
1
Parent(s):
b7ca7fe
Adding changes in README
Browse files- README.md +22 -5
- train.py +1 -1
- training.log +57 -19
README.md
CHANGED
@@ -23,6 +23,7 @@ This project implements a transformer-based language model using PyTorch. The mo
|
|
23 |
- [Actual Training](#actual-training)
|
24 |
- [Checkpointing](#checkpointing)
|
25 |
- [Model Compression](#model-compression)
|
|
|
26 |
- [License](#license)
|
27 |
- [Acknowledgments](#acknowledgments)
|
28 |
|
@@ -48,17 +49,18 @@ This project implements a transformer-based language model using PyTorch. The mo
|
|
48 |
git clone https://github.com/yourusername/transformer-model-training.git
|
49 |
cd transformer-model-training
|
50 |
```
|
51 |
-
|
|
|
52 |
```bash
|
53 |
-
|
54 |
```
|
55 |
|
56 |
## Usage
|
57 |
1. Prepare your text data in a file named `input.txt`. The model will read this file to load tokens for training.
|
58 |
|
59 |
-
2.
|
60 |
```bash
|
61 |
-
python
|
62 |
```
|
63 |
|
64 |
3. The model will save checkpoints after each epoch in `checkpoint.pt` and the final model in `trained_model_quantized.pt`.
|
@@ -75,9 +77,17 @@ This project implements a transformer-based language model using PyTorch. The mo
|
|
75 |
- The training loop includes loss calculation, backpropagation, and optimizer steps.
|
76 |
- The loss is monitored, and checkpoints are saved to allow for resuming training.
|
77 |
- The training process is logged in `training.log`, which contains detailed statistics for each epoch, including loss values and checkpointing information.
|
|
|
78 |
|
79 |
## Actual Training
|
80 |
-
The model was trained for a total of **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
81 |
|
82 |
## Checkpointing
|
83 |
- The model state and current epoch are saved in a single checkpoint file (`checkpoint.pt`).
|
@@ -86,6 +96,13 @@ The model was trained for a total of **78 epochs**. The final loss achieved at t
|
|
86 |
## Model Compression
|
87 |
- The final model is saved with compression to reduce file size. The model file will be saved as `trained_model_quantized.pt`.
|
88 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
89 |
## License
|
90 |
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
|
91 |
|
|
|
23 |
- [Actual Training](#actual-training)
|
24 |
- [Checkpointing](#checkpointing)
|
25 |
- [Model Compression](#model-compression)
|
26 |
+
- [Working Demo](#working-demo)
|
27 |
- [License](#license)
|
28 |
- [Acknowledgments](#acknowledgments)
|
29 |
|
|
|
49 |
git clone https://github.com/yourusername/transformer-model-training.git
|
50 |
cd transformer-model-training
|
51 |
```
|
52 |
+
|
53 |
+
2. Install the required packages:
|
54 |
```bash
|
55 |
+
pip install -r requirements.txt
|
56 |
```
|
57 |
|
58 |
## Usage
|
59 |
1. Prepare your text data in a file named `input.txt`. The model will read this file to load tokens for training.
|
60 |
|
61 |
+
2. To train the model, run the training script:
|
62 |
```bash
|
63 |
+
python train.py
|
64 |
```
|
65 |
|
66 |
3. The model will save checkpoints after each epoch in `checkpoint.pt` and the final model in `trained_model_quantized.pt`.
|
|
|
77 |
- The training loop includes loss calculation, backpropagation, and optimizer steps.
|
78 |
- The loss is monitored, and checkpoints are saved to allow for resuming training.
|
79 |
- The training process is logged in `training.log`, which contains detailed statistics for each epoch, including loss values and checkpointing information.
|
80 |
+
- The training process supports early stopping if the loss falls below a specified threshold (0.099999), which can help prevent overfitting and reduce training time.
|
81 |
|
82 |
## Actual Training
|
83 |
+
The model was trained for a total of **91 epochs**. The training process involved the following steps:
|
84 |
+
- **Data Preparation**: The model reads and encodes text data from `input.txt`, loading a total of **338,025 tokens**.
|
85 |
+
- **Batch Processing**: Each epoch consists of **82 batches**, with each batch containing sequences of tokens for training.
|
86 |
+
- **Loss Monitoring**: The loss is calculated at each step, and the model's performance is tracked throughout the training process.
|
87 |
+
- **Checkpointing**: The model state and current epoch are saved in a single checkpoint file (`checkpoint.pt`) after each epoch, allowing for recovery in case of interruptions.
|
88 |
+
- **Final Model**: After training, the model is saved with quantization and compression as `trained_model_quantized.pt`, reducing the file size for easier deployment. The final loss achieved at the end of training was approximately 0.089421.
|
89 |
+
|
90 |
+
The training log file contains detailed statistics for each epoch, including loss values and checkpointing information. You can find the log file named `training.log` in the project directory.
|
91 |
|
92 |
## Checkpointing
|
93 |
- The model state and current epoch are saved in a single checkpoint file (`checkpoint.pt`).
|
|
|
96 |
## Model Compression
|
97 |
- The final model is saved with compression to reduce file size. The model file will be saved as `trained_model_quantized.pt`.
|
98 |
|
99 |
+
## Working Demo
|
100 |
+
You can try out the working demo of the model on Hugging Face Spaces:
|
101 |
+
|
102 |
+

|
103 |
+
|
104 |
+
[Play with the Demo Here](https://huggingface.co/spaces/yourusername/your-demo)
|
105 |
+
|
106 |
## License
|
107 |
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
|
108 |
|
train.py
CHANGED
@@ -26,7 +26,7 @@ def load_latest_checkpoint(model):
|
|
26 |
start_epoch = load_latest_checkpoint(model)
|
27 |
|
28 |
# Training loop
|
29 |
-
num_epochs =
|
30 |
|
31 |
# Start time tracking
|
32 |
start_time = time.time()
|
|
|
26 |
start_epoch = load_latest_checkpoint(model)
|
27 |
|
28 |
# Training loop
|
29 |
+
num_epochs = 91
|
30 |
|
31 |
# Start time tracking
|
32 |
start_time = time.time()
|
training.log
CHANGED
@@ -215,7 +215,7 @@ Checkpoint saved to checkpoint.pt
|
|
215 |
Total training time: 127 minutes and 37 seconds
|
216 |
Model saved to trained_model_quantized.pt with quantization and compression.
|
217 |
==================================================
|
218 |
-
Increased epoch to
|
219 |
==================================================
|
220 |
using device: cuda
|
221 |
loaded 338025 tokens
|
@@ -224,30 +224,68 @@ Number of model parameters: 124439808
|
|
224 |
Loading checkpoint from checkpoint.pt
|
225 |
/content/erav3-s12-transformer-model/erav3-s12-transformer-model/transformer.py:262: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
|
226 |
checkpoint = torch.load(checkpoint_file)
|
227 |
-
Epoch 71/
|
228 |
-
Epoch 71/
|
229 |
Checkpoint saved to checkpoint.pt
|
230 |
-
Epoch 72/
|
231 |
-
Epoch 72/
|
232 |
Checkpoint saved to checkpoint.pt
|
233 |
-
Epoch 73/
|
234 |
-
Epoch 73/
|
235 |
Checkpoint saved to checkpoint.pt
|
236 |
-
Epoch 74/
|
237 |
-
Epoch 74/
|
238 |
Checkpoint saved to checkpoint.pt
|
239 |
-
Epoch 75/
|
240 |
-
Epoch 75/
|
241 |
Checkpoint saved to checkpoint.pt
|
242 |
-
Epoch 76/
|
243 |
-
Epoch 76/
|
244 |
Checkpoint saved to checkpoint.pt
|
245 |
-
Epoch 77/
|
246 |
-
Epoch 77/
|
247 |
Checkpoint saved to checkpoint.pt
|
248 |
-
Epoch 78/
|
249 |
-
Epoch 78/
|
250 |
Checkpoint saved to checkpoint.pt
|
251 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
252 |
Model saved to trained_model_quantized.pt with quantization and compression.
|
253 |
-
|
|
|
215 |
Total training time: 127 minutes and 37 seconds
|
216 |
Model saved to trained_model_quantized.pt with quantization and compression.
|
217 |
==================================================
|
218 |
+
Increased epoch to 91 to reach loss < 0.099999
|
219 |
==================================================
|
220 |
using device: cuda
|
221 |
loaded 338025 tokens
|
|
|
224 |
Loading checkpoint from checkpoint.pt
|
225 |
/content/erav3-s12-transformer-model/erav3-s12-transformer-model/transformer.py:262: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
|
226 |
checkpoint = torch.load(checkpoint_file)
|
227 |
+
Epoch 71/91: 100% 82/82 [01:36<00:00, 1.18s/it]
|
228 |
+
Epoch 71/91, Loss: 1.453567
|
229 |
Checkpoint saved to checkpoint.pt
|
230 |
+
Epoch 72/91: 100% 82/82 [01:42<00:00, 1.25s/it]
|
231 |
+
Epoch 72/91, Loss: 1.162141
|
232 |
Checkpoint saved to checkpoint.pt
|
233 |
+
Epoch 73/91: 100% 82/82 [01:42<00:00, 1.24s/it]
|
234 |
+
Epoch 73/91, Loss: 1.174683
|
235 |
Checkpoint saved to checkpoint.pt
|
236 |
+
Epoch 74/91: 100% 82/82 [01:42<00:00, 1.25s/it]
|
237 |
+
Epoch 74/91, Loss: 1.089287
|
238 |
Checkpoint saved to checkpoint.pt
|
239 |
+
Epoch 75/91: 100% 82/82 [01:42<00:00, 1.25s/it]
|
240 |
+
Epoch 75/91, Loss: 1.010704
|
241 |
Checkpoint saved to checkpoint.pt
|
242 |
+
Epoch 76/91: 100% 82/82 [01:42<00:00, 1.24s/it]
|
243 |
+
Epoch 76/91, Loss: 0.979691
|
244 |
Checkpoint saved to checkpoint.pt
|
245 |
+
Epoch 77/91: 100% 82/82 [01:41<00:00, 1.24s/it]
|
246 |
+
Epoch 77/91, Loss: 0.918769
|
247 |
Checkpoint saved to checkpoint.pt
|
248 |
+
Epoch 78/91: 100% 82/82 [01:41<00:00, 1.24s/it]
|
249 |
+
Epoch 78/91, Loss: 0.904894
|
250 |
Checkpoint saved to checkpoint.pt
|
251 |
+
Epoch 79/91: 100% 82/82 [01:41<00:00, 1.24s/it]
|
252 |
+
Epoch 79/91, Loss: 0.851253
|
253 |
+
Checkpoint saved to checkpoint.pt
|
254 |
+
Epoch 80/91: 100% 82/82 [01:41<00:00, 1.24s/it]
|
255 |
+
Epoch 80/91, Loss: 0.810432
|
256 |
+
Checkpoint saved to checkpoint.pt
|
257 |
+
Epoch 81/91: 100% 82/82 [01:41<00:00, 1.24s/it]
|
258 |
+
Epoch 81/91, Loss: 0.730137
|
259 |
+
Checkpoint saved to checkpoint.pt
|
260 |
+
Epoch 82/91: 100% 82/82 [01:41<00:00, 1.24s/it]
|
261 |
+
Epoch 82/91, Loss: 0.677209
|
262 |
+
Checkpoint saved to checkpoint.pt
|
263 |
+
Epoch 83/91: 100% 82/82 [01:41<00:00, 1.24s/it]
|
264 |
+
Epoch 83/91, Loss: 0.618384
|
265 |
+
Checkpoint saved to checkpoint.pt
|
266 |
+
Epoch 84/91: 100% 82/82 [01:41<00:00, 1.24s/it]
|
267 |
+
Epoch 84/91, Loss: 0.570543
|
268 |
+
Checkpoint saved to checkpoint.pt
|
269 |
+
Epoch 85/91: 100% 82/82 [01:41<00:00, 1.24s/it]
|
270 |
+
Epoch 85/91, Loss: 0.516322
|
271 |
+
Checkpoint saved to checkpoint.pt
|
272 |
+
Epoch 86/91: 100% 82/82 [01:41<00:00, 1.24s/it]
|
273 |
+
Epoch 86/91, Loss: 0.432109
|
274 |
+
Checkpoint saved to checkpoint.pt
|
275 |
+
Epoch 87/91: 100% 82/82 [01:41<00:00, 1.24s/it]
|
276 |
+
Epoch 87/91, Loss: 0.320471
|
277 |
+
Checkpoint saved to checkpoint.pt
|
278 |
+
Epoch 88/91: 100% 82/82 [01:41<00:00, 1.24s/it]
|
279 |
+
Epoch 88/91, Loss: 0.271299
|
280 |
+
Checkpoint saved to checkpoint.pt
|
281 |
+
Epoch 89/91: 100% 82/82 [01:41<00:00, 1.24s/it]
|
282 |
+
Epoch 89/91, Loss: 0.218522
|
283 |
+
Checkpoint saved to checkpoint.pt
|
284 |
+
Epoch 90/91: 100% 82/82 [01:41<00:00, 1.24s/it]
|
285 |
+
Epoch 90/91, Loss: 0.121739
|
286 |
+
Checkpoint saved to checkpoint.pt
|
287 |
+
Epoch 91/91: 100% 82/82 [01:41<00:00, 1.24s/it]
|
288 |
+
Epoch 91/91, Loss: 0.089421
|
289 |
+
Checkpoint saved to checkpoint.pt
|
290 |
+
Total training time: 34 minutes and 28 seconds
|
291 |
Model saved to trained_model_quantized.pt with quantization and compression.
|
|