Spaces:
Sleeping
A newer version of the Streamlit SDK is available:
1.43.1
title: Simple Transformer
emoji: 🔥
colorFrom: indigo
colorTo: gray
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: false
short_description: Transformer trained on Shakespeare play dataset
Transformer Model Training
This project implements a transformer-based language model using PyTorch. The model is designed to learn from a text corpus and can be trained and fine-tuned for various natural language processing tasks.
Table of Contents
- Features
- Requirements
- Installation
- Usage
- Training
- Actual Training
- Checkpointing
- Model Compression
- Working Demo
- License
- Acknowledgments
Features
- Transformer architecture with causal self-attention and feedforward layers.
- Efficient data loading and batching.
- Checkpointing to resume training.
- Support for multiple devices (CPU, CUDA, MPS).
- Model compression for reduced file size.
- Streamlit application for text generation using the trained model.
Requirements
- Python 3.6 or higher
- PyTorch 1.7 or higher
- tqdm
- tiktoken
- streamlit
- transformers
Installation
Clone the repository:
git clone https://github.com/yourusername/transformer-model-training.git cd transformer-model-training
Install the required packages:
pip install -r requirements.txt
Usage
Prepare your text data in a file named
input.txt
. The model will read this file to load tokens for training.To train the model, run the training script:
python train.py
The model will save checkpoints after each epoch in
checkpoint.pt
and the final model intrained_model_quantized.pt
.To generate text using the trained model, run the Streamlit application:
streamlit run app.py
Enter your text and specify the length of additional text to generate in the Streamlit interface.
Training
- The model is trained using a batch size of 4 and a learning rate of 3e-4.
- The training loop includes loss calculation, backpropagation, and optimizer steps.
- The loss is monitored, and checkpoints are saved to allow for resuming training.
- The training process is logged in
training.log
, which contains detailed statistics for each epoch, including loss values and checkpointing information. - The training process supports early stopping if the loss falls below a specified threshold (0.099999), which can help prevent overfitting and reduce training time.
Actual Training
The model was trained for a total of 91 epochs. The training process involved the following steps:
- Data Preparation: The model reads and encodes text data from
input.txt
, loading a total of 338,025 tokens. - Batch Processing: Each epoch consists of 82 batches, with each batch containing sequences of tokens for training.
- Loss Monitoring: The loss is calculated at each step, and the model's performance is tracked throughout the training process.
- Checkpointing: The model state and current epoch are saved in a single checkpoint file (
checkpoint.pt
) after each epoch, allowing for recovery in case of interruptions. - Final Model: After training, the model is saved with quantization and compression as
trained_model_quantized.pt
, reducing the file size for easier deployment. The final loss achieved at the end of training was approximately 0.089421.
The training log file contains detailed statistics for each epoch, including loss values and checkpointing information. You can find the log file named training.log
in the project directory.
Checkpointing
- The model state and current epoch are saved in a single checkpoint file (
checkpoint.pt
). - To resume training from the last checkpoint, simply run the training script again. The model will automatically load the latest checkpoint.
Model Compression
- The final model is saved with compression to reduce file size. The model file will be saved as
trained_model_quantized.pt
.
Working Demo
You can try out the working demo of the model on Hugging Face Spaces:
License
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
- This project is inspired by the original GPT architecture and various resources available in the NLP community.