How to set grad checkpointing?
#6
by
mactavish91
- opened
If not in use, the GPU memory usage is too high.
Could you check the guide here? https://huggingface.co/docs/transformers/main/en/perf_train_gpu_one#gradient-checkpointing