This is the trained model file for Ch1 - Attention is all you need
. This chapter creates a transformer from scratch for English
to Hindi
translation. Please use any of the checkpoints for inference.
Loss Graph:
Training specs: Trained on Nvidia A10 GPU (24G) for 12hrs.
return {
'batch_size': 85,
'num_samples': 1000000,
'num_epochs': 10,
'lr': 10**-4,
'seq_len': 128,
'd_model': 512,
'datasource': "runs",
'tgt_language': 'hi',
'model_folder': 'weights',
'model_basename': 'tmodel_',
'preload': None,
'tokenizer_folder': 'tokenizer',
'vocab_size': 52000,
}
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.