Spaces:

MilindChawre
/

simple-transformer

Sleeping

App Files Files Community

MilindChawre commited on Jan 18

Commit

ac5a860

1 Parent(s): b7ca7fe

Adding changes in README

Browse files

Files changed (3) hide show

README.md +22 -5
train.py +1 -1
training.log +57 -19

README.md CHANGED Viewed

@@ -23,6 +23,7 @@ This project implements a transformer-based language model using PyTorch. The mo
 - [Actual Training](#actual-training)
 - [Checkpointing](#checkpointing)
 - [Model Compression](#model-compression)
 - [License](#license)
 - [Acknowledgments](#acknowledgments)
@@ -48,17 +49,18 @@ This project implements a transformer-based language model using PyTorch. The mo
    git clone https://github.com/yourusername/transformer-model-training.git
    cd transformer-model-training
    ```
-2. To train the model, run the training script:
    ```bash
-   python train.py
    ```
 ## Usage
 1. Prepare your text data in a file named `input.txt`. The model will read this file to load tokens for training.
-2. Run the training script:
    ```bash
-   python transformer.py
    ```
 3. The model will save checkpoints after each epoch in `checkpoint.pt` and the final model in `trained_model_quantized.pt`.
@@ -75,9 +77,17 @@ This project implements a transformer-based language model using PyTorch. The mo
 - The training loop includes loss calculation, backpropagation, and optimizer steps.
 - The loss is monitored, and checkpoints are saved to allow for resuming training.
 - The training process is logged in `training.log`, which contains detailed statistics for each epoch, including loss values and checkpointing information.
 ## Actual Training
-The model was trained for a total of **78 epochs**. The final loss achieved at the end of training was approximately **0.904894**. The training log file contains detailed statistics for each epoch, including loss values and checkpointing information. You can find the log file named `training.log` in the project directory.
 ## Checkpointing
 - The model state and current epoch are saved in a single checkpoint file (`checkpoint.pt`).
@@ -86,6 +96,13 @@ The model was trained for a total of **78 epochs**. The final loss achieved at t
 ## Model Compression
 - The final model is saved with compression to reduce file size. The model file will be saved as `trained_model_quantized.pt`.
 ## License
 This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

 - [Actual Training](#actual-training)
 - [Checkpointing](#checkpointing)
 - [Model Compression](#model-compression)
+- [Working Demo](#working-demo)
 - [License](#license)
 - [Acknowledgments](#acknowledgments)
    git clone https://github.com/yourusername/transformer-model-training.git
    cd transformer-model-training
    ```
+2. Install the required packages:
    ```bash
+   pip install -r requirements.txt
    ```
 ## Usage
 1. Prepare your text data in a file named `input.txt`. The model will read this file to load tokens for training.
+2. To train the model, run the training script:
    ```bash
+   python train.py
    ```
 3. The model will save checkpoints after each epoch in `checkpoint.pt` and the final model in `trained_model_quantized.pt`.
 - The training loop includes loss calculation, backpropagation, and optimizer steps.
 - The loss is monitored, and checkpoints are saved to allow for resuming training.
 - The training process is logged in `training.log`, which contains detailed statistics for each epoch, including loss values and checkpointing information.
+- The training process supports early stopping if the loss falls below a specified threshold (0.099999), which can help prevent overfitting and reduce training time.
 ## Actual Training
+The model was trained for a total of **91 epochs**. The training process involved the following steps:
+- **Data Preparation**: The model reads and encodes text data from `input.txt`, loading a total of **338,025 tokens**.
+- **Batch Processing**: Each epoch consists of **82 batches**, with each batch containing sequences of tokens for training.
+- **Loss Monitoring**: The loss is calculated at each step, and the model's performance is tracked throughout the training process.
+- **Checkpointing**: The model state and current epoch are saved in a single checkpoint file (`checkpoint.pt`) after each epoch, allowing for recovery in case of interruptions.
+- **Final Model**: After training, the model is saved with quantization and compression as `trained_model_quantized.pt`, reducing the file size for easier deployment. The final loss achieved at the end of training was approximately 0.089421.
+The training log file contains detailed statistics for each epoch, including loss values and checkpointing information. You can find the log file named `training.log` in the project directory.
 ## Checkpointing
 - The model state and current epoch are saved in a single checkpoint file (`checkpoint.pt`).
 ## Model Compression
 - The final model is saved with compression to reduce file size. The model file will be saved as `trained_model_quantized.pt`.
+## Working Demo
+You can try out the working demo of the model on Hugging Face Spaces:
+![Hugging Face Spaces Demo](https://link-to-your-image.com/demo-image.png)
+[Play with the Demo Here](https://huggingface.co/spaces/yourusername/your-demo)
 ## License
 This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

train.py CHANGED Viewed

@@ -26,7 +26,7 @@ def load_latest_checkpoint(model):
 start_epoch = load_latest_checkpoint(model)
 # Training loop
-num_epochs = 78
 # Start time tracking
 start_time = time.time()

 start_epoch = load_latest_checkpoint(model)
 # Training loop
+num_epochs = 91
 # Start time tracking
 start_time = time.time()

training.log CHANGED Viewed

@@ -215,7 +215,7 @@ Checkpoint saved to checkpoint.pt
 Total training time: 127 minutes and 37 seconds
 Model saved to trained_model_quantized.pt with quantization and compression.
 ==================================================
-Increased epoch to 78 to reach loss < 0.99999
 ==================================================
 using device: cuda
 loaded 338025 tokens
@@ -224,30 +224,68 @@ Number of model parameters: 124439808
 Loading checkpoint from checkpoint.pt
 /content/erav3-s12-transformer-model/erav3-s12-transformer-model/transformer.py:262: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
   checkpoint = torch.load(checkpoint_file)
-Epoch 71/78: 100% 82/82 [01:36<00:00,  1.18s/it]
-Epoch 71/78, Loss: 1.453567
 Checkpoint saved to checkpoint.pt
-Epoch 72/78: 100% 82/82 [01:42<00:00,  1.25s/it]
-Epoch 72/78, Loss: 1.162141
 Checkpoint saved to checkpoint.pt
-Epoch 73/78: 100% 82/82 [01:42<00:00,  1.24s/it]
-Epoch 73/78, Loss: 1.174683
 Checkpoint saved to checkpoint.pt
-Epoch 74/78: 100% 82/82 [01:42<00:00,  1.25s/it]
-Epoch 74/78, Loss: 1.089287
 Checkpoint saved to checkpoint.pt
-Epoch 75/78: 100% 82/82 [01:42<00:00,  1.25s/it]
-Epoch 75/78, Loss: 1.010704
 Checkpoint saved to checkpoint.pt
-Epoch 76/78: 100% 82/82 [01:42<00:00,  1.24s/it]
-Epoch 76/78, Loss: 0.979691
 Checkpoint saved to checkpoint.pt
-Epoch 77/78: 100% 82/82 [01:41<00:00,  1.24s/it]
-Epoch 77/78, Loss: 0.918769
 Checkpoint saved to checkpoint.pt
-Epoch 78/78: 100% 82/82 [01:41<00:00,  1.24s/it]
-Epoch 78/78, Loss: 0.904894
 Checkpoint saved to checkpoint.pt
-Total training time: 14 minutes and 37 seconds
 Model saved to trained_model_quantized.pt with quantization and compression.

 Total training time: 127 minutes and 37 seconds
 Model saved to trained_model_quantized.pt with quantization and compression.
 ==================================================
+Increased epoch to 91 to reach loss < 0.099999
 ==================================================
 using device: cuda
 loaded 338025 tokens
 Loading checkpoint from checkpoint.pt
 /content/erav3-s12-transformer-model/erav3-s12-transformer-model/transformer.py:262: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
   checkpoint = torch.load(checkpoint_file)
+Epoch 71/91: 100% 82/82 [01:36<00:00,  1.18s/it]
+Epoch 71/91, Loss: 1.453567
 Checkpoint saved to checkpoint.pt
+Epoch 72/91: 100% 82/82 [01:42<00:00,  1.25s/it]
+Epoch 72/91, Loss: 1.162141
 Checkpoint saved to checkpoint.pt
+Epoch 73/91: 100% 82/82 [01:42<00:00,  1.24s/it]
+Epoch 73/91, Loss: 1.174683
 Checkpoint saved to checkpoint.pt
+Epoch 74/91: 100% 82/82 [01:42<00:00,  1.25s/it]
+Epoch 74/91, Loss: 1.089287
 Checkpoint saved to checkpoint.pt
+Epoch 75/91: 100% 82/82 [01:42<00:00,  1.25s/it]
+Epoch 75/91, Loss: 1.010704
 Checkpoint saved to checkpoint.pt
+Epoch 76/91: 100% 82/82 [01:42<00:00,  1.24s/it]
+Epoch 76/91, Loss: 0.979691
 Checkpoint saved to checkpoint.pt
+Epoch 77/91: 100% 82/82 [01:41<00:00,  1.24s/it]
+Epoch 77/91, Loss: 0.918769
 Checkpoint saved to checkpoint.pt
+Epoch 78/91: 100% 82/82 [01:41<00:00,  1.24s/it]
+Epoch 78/91, Loss: 0.904894
 Checkpoint saved to checkpoint.pt
+Epoch 79/91: 100% 82/82 [01:41<00:00,  1.24s/it]
+Epoch 79/91, Loss: 0.851253
+Checkpoint saved to checkpoint.pt
+Epoch 80/91: 100% 82/82 [01:41<00:00,  1.24s/it]
+Epoch 80/91, Loss: 0.810432
+Checkpoint saved to checkpoint.pt
+Epoch 81/91: 100% 82/82 [01:41<00:00,  1.24s/it]
+Epoch 81/91, Loss: 0.730137
+Checkpoint saved to checkpoint.pt
+Epoch 82/91: 100% 82/82 [01:41<00:00,  1.24s/it]
+Epoch 82/91, Loss: 0.677209
+Checkpoint saved to checkpoint.pt
+Epoch 83/91: 100% 82/82 [01:41<00:00,  1.24s/it]
+Epoch 83/91, Loss: 0.618384
+Checkpoint saved to checkpoint.pt
+Epoch 84/91: 100% 82/82 [01:41<00:00,  1.24s/it]
+Epoch 84/91, Loss: 0.570543
+Checkpoint saved to checkpoint.pt
+Epoch 85/91: 100% 82/82 [01:41<00:00,  1.24s/it]
+Epoch 85/91, Loss: 0.516322
+Checkpoint saved to checkpoint.pt
+Epoch 86/91: 100% 82/82 [01:41<00:00,  1.24s/it]
+Epoch 86/91, Loss: 0.432109
+Checkpoint saved to checkpoint.pt
+Epoch 87/91: 100% 82/82 [01:41<00:00,  1.24s/it]
+Epoch 87/91, Loss: 0.320471
+Checkpoint saved to checkpoint.pt
+Epoch 88/91: 100% 82/82 [01:41<00:00,  1.24s/it]
+Epoch 88/91, Loss: 0.271299
+Checkpoint saved to checkpoint.pt
+Epoch 89/91: 100% 82/82 [01:41<00:00,  1.24s/it]
+Epoch 89/91, Loss: 0.218522
+Checkpoint saved to checkpoint.pt
+Epoch 90/91: 100% 82/82 [01:41<00:00,  1.24s/it]
+Epoch 90/91, Loss: 0.121739
+Checkpoint saved to checkpoint.pt
+Epoch 91/91: 100% 82/82 [01:41<00:00,  1.24s/it]
+Epoch 91/91, Loss: 0.089421
+Checkpoint saved to checkpoint.pt
+Total training time: 34 minutes and 28 seconds
 Model saved to trained_model_quantized.pt with quantization and compression.