DDP training: Compute loss only on a sinlge worker #15018

ChristophReich1996 · 2022-10-06T12:53:03Z

ChristophReich1996
Oct 6, 2022

Hi Lightning-AI team,

I hope this is the right place to ask. I have the following implementation issue. I would like to train with DDP but compute the loss just on a single worker, since my loss function heavily relies on large batch size and batch statistics (similar to Batch Normalization).

Pseudo code:

def training_step(  # type: ignore
    self,
    batch: Tuple[Tensor, Tensor],  # type: ignore
    batch_index: int,  # type: ignore
) -> Dict[str, Tensor]:  # type: ignore
    # Unpack batched data
    input, label = batch  # type: Tensor, Tensor
    # Make forward pass
    prediction: Tensor = self(input)
    # Compute loss (communicate all predictions and labels here...)
    loss: Tensor = self.loss_function(prediction, label)
    # Log loss etc.
    self.log("loss", loss, sync_dist=True)
    return {"loss": loss}

Before computing the loss I would communicate the predictions and labels of all workers with gather_all_tensors. But how do I ensure the loss is computed only on the first worker? Additionally, what is the correct way to log the loss in this specific case, and what should be the returned loss of all other workers, not computing the loss?

Thanks for the help :)
Christoph

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DDP training: Compute loss only on a sinlge worker #15018

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

DDP training: Compute loss only on a sinlge worker #15018

Uh oh!

Uh oh!

ChristophReich1996 Oct 6, 2022

Replies: 0 comments

ChristophReich1996
Oct 6, 2022