How to add Deepspeed Activation Checkpointing to LLM for Fine-Tuning in PyTorch Lightning? #18009

rileyhun · 2023-07-07T03:48:30Z

rileyhun
Jul 7, 2023

I'm trying to enable activation checkpointing for a T5-3b model to significantly free up GPU memory. However, based on the PTL docs, it's not quite clear how to do the implementation for an pre-trained LLM from Huggingface.

Here is the minimal code to reproduce:

class T5FineTuner(pl.LightningModule):
    """PyTorch Lightning T5 Model class"""

    def __init__(self, hparams, tokenizer, model):
        """initiates a PyTorch Lightning T5 Model"""
        super().__init__()
        self.hparams.update(vars(hparams))
        self.save_hyperparameters(self.hparams)

        self.model = model
        self.tokenizer = tokenizer
        self.outputdir = self.hparams.output_dir
        self.average_training_loss = None
        self.average_validation_loss = None
        self.save_only_last_epoch = self.hparams.save_only_last_epoch
    
    def forward(self, input_ids, attention_mask, decoder_attention_mask, labels=None):
                
        return deepspeed.checkpointing.checkpoint(self._forward, input_ids, attention_mask, decoder_attention_mask, labels)
    
    def _forward(self, input_ids, attention_mask, decoder_attention_mask, labels=None):
                
        output = self.model(
            input_ids,
            attention_mask=attention_mask,
            labels=labels,
            decoder_attention_mask=decoder_attention_mask,
        )

        return output.loss, output.logits

    def training_step(self, batch, batch_size):
        """training step"""
        input_ids = batch["source_text_input_ids"]
        attention_mask = batch["source_text_attention_mask"]
        labels = batch["labels"]
        labels_attention_mask = batch["labels_attention_mask"]

        loss, outputs = self(
            input_ids=input_ids,
            attention_mask=attention_mask,
            decoder_attention_mask=labels_attention_mask,
            labels=labels,
        )

        self.log(
            "train_loss",
            loss,
            prog_bar=True,
            logger=True,
            on_epoch=True,
            on_step=True,
            sync_dist=True,
        )
        return loss

Here is the full code to reproduce - https://github.com/rileyhun/llm_finetuning_metaflow/blob/main/pytorch-deepspeed/src/model_training.py

The error I'm getting is as follows:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Any guidance would be greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to add Deepspeed Activation Checkpointing to LLM for Fine-Tuning in PyTorch Lightning? #18009

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to add Deepspeed Activation Checkpointing to LLM for Fine-Tuning in PyTorch Lightning? #18009

Uh oh!

Uh oh!

rileyhun Jul 7, 2023

Replies: 0 comments

rileyhun
Jul 7, 2023