This is the trained model file for Ch1 - Attention is all you need. This chapter creates a transformer from scratch for English to Hindi translation. Please use any of the checkpoints for inference. Loss Graph: image.png

Training specs: Trained on Nvidia A10 GPU (24G) for 12hrs.

return {
'batch_size': 85,
'num_samples': 1000000,
'num_epochs': 10,
'lr': 10**-4,
'seq_len': 128,
'd_model': 512,
'datasource': "runs",
'tgt_language': 'hi',
'model_folder': 'weights',
'model_basename': 'tmodel_',
'preload': None,
'tokenizer_folder': 'tokenizer',
'vocab_size': 52000,
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Dataset used to train s1lv3rj1nx/ch1