How to add Deepspeed Activation Checkpointing to LLM for Fine-Tuning in PyTorch Lightning? #18009
Unanswered
rileyhun
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to enable activation checkpointing for a T5-3b model to significantly free up GPU memory. However, based on the PTL docs, it's not quite clear how to do the implementation for an pre-trained LLM from Huggingface.
Here is the minimal code to reproduce:
Here is the full code to reproduce - https://github.com/rileyhun/llm_finetuning_metaflow/blob/main/pytorch-deepspeed/src/model_training.py
The error I'm getting is as follows:
Any guidance would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions