Distributed Modes Speed Benchmark #3436
Unanswered
briankosw
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Is there a benchmark on the training speed of each of the distributed modes? I think it'd be really nice to see how each of these modes perform, especially DDP, DDP_spawn, DDP2, and Horovod.
Another question:
Why does DDP perform better than DDP_spawn? It seems that DDP internally launches the same script with different environment variables (so analogous to
torch.distributed.launch
) while DDP_spawn spawns a bunch of subprocesses (so analogous totorch.multiprocessing.spawn
). I'm having a hard time understanding why DDP is advantageous compared to DDP_spawn, and I wanted more explanation on the limitations of DDP_spawn listed in Multi-GPU Training.Beta Was this translation helpful? Give feedback.
All reactions