DDP training: Compute loss only on a sinlge worker #15018
Unanswered
ChristophReich1996
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Lightning-AI team,
I hope this is the right place to ask. I have the following implementation issue. I would like to train with DDP but compute the loss just on a single worker, since my loss function heavily relies on large batch size and batch statistics (similar to Batch Normalization).
Pseudo code:
Before computing the loss I would communicate the predictions and labels of all workers with
gather_all_tensors
. But how do I ensure the loss is computed only on the first worker? Additionally, what is the correct way to log the loss in this specific case, and what should be the returned loss of all other workers, not computing the loss?Thanks for the help :)
Christoph
Beta Was this translation helpful? Give feedback.
All reactions