s1lv3rj1nx/ch1 · Hugging Face

This is the trained model file for Ch1 - Attention is all you need. This chapter creates a transformer from scratch for English to Hindi translation. Please use any of the checkpoints for inference. Loss Graph:

Training specs: Trained on Nvidia A10 GPU (24G) for 12hrs.

return {
'batch_size': 85,
'num_samples': 1000000,
'num_epochs': 10,
'lr': 10**-4,
'seq_len': 128,
'd_model': 512,
'datasource': "runs",
'tgt_language': 'hi',
'model_folder': 'weights',
'model_basename': 'tmodel_',
'preload': None,
'tokenizer_folder': 'tokenizer',
'vocab_size': 52000,
}

s1lv3rj1nx
/

ch1

Dataset used to train s1lv3rj1nx/ch1