simple-transformer /
MilindChawre's picture
Adding changes in README

A newer version of the Streamlit SDK is available: 1.43.1

title: Simple Transformer
emoji: 🔥
colorFrom: indigo
colorTo: gray
sdk: streamlit
sdk_version: 1.41.1
pinned: false
short_description: Transformer trained on Shakespeare play dataset

Transformer Model Training

This project implements a transformer-based language model using PyTorch. The model is designed to learn from a text corpus and can be trained and fine-tuned for various natural language processing tasks.

Table of Contents


  • Transformer architecture with causal self-attention and feedforward layers.
  • Efficient data loading and batching.
  • Checkpointing to resume training.
  • Support for multiple devices (CPU, CUDA, MPS).
  • Model compression for reduced file size.
  • Streamlit application for text generation using the trained model.


  • Python 3.6 or higher
  • PyTorch 1.7 or higher
  • tqdm
  • tiktoken
  • streamlit
  • transformers


  1. Clone the repository:

    git clone
    cd transformer-model-training
  2. Install the required packages:

    pip install -r requirements.txt


  1. Prepare your text data in a file named input.txt. The model will read this file to load tokens for training.

  2. To train the model, run the training script:

  3. The model will save checkpoints after each epoch in and the final model in

  4. To generate text using the trained model, run the Streamlit application:

    streamlit run
  5. Enter your text and specify the length of additional text to generate in the Streamlit interface.


  • The model is trained using a batch size of 4 and a learning rate of 3e-4.
  • The training loop includes loss calculation, backpropagation, and optimizer steps.
  • The loss is monitored, and checkpoints are saved to allow for resuming training.
  • The training process is logged in training.log, which contains detailed statistics for each epoch, including loss values and checkpointing information.
  • The training process supports early stopping if the loss falls below a specified threshold (0.099999), which can help prevent overfitting and reduce training time.

Actual Training

The model was trained for a total of 91 epochs. The training process involved the following steps:

  • Data Preparation: The model reads and encodes text data from input.txt, loading a total of 338,025 tokens.
  • Batch Processing: Each epoch consists of 82 batches, with each batch containing sequences of tokens for training.
  • Loss Monitoring: The loss is calculated at each step, and the model's performance is tracked throughout the training process.
  • Checkpointing: The model state and current epoch are saved in a single checkpoint file ( after each epoch, allowing for recovery in case of interruptions.
  • Final Model: After training, the model is saved with quantization and compression as, reducing the file size for easier deployment. The final loss achieved at the end of training was approximately 0.089421.

The training log file contains detailed statistics for each epoch, including loss values and checkpointing information. You can find the log file named training.log in the project directory.


  • The model state and current epoch are saved in a single checkpoint file (
  • To resume training from the last checkpoint, simply run the training script again. The model will automatically load the latest checkpoint.

Model Compression

  • The final model is saved with compression to reduce file size. The model file will be saved as

Working Demo

You can try out the working demo of the model on Hugging Face Spaces:

Hugging Face Spaces Demo

Play with the Demo Here


This project is licensed under the MIT License. See the LICENSE file for details.


  • This project is inspired by the original GPT architecture and various resources available in the NLP community.