Multiple Jobs on Multiple GPUs #18799
Unanswered
tommycwh
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am in a situation to train many independent models, and I want to start a process which helps me train models one by one automatically. I have multiple GPUs so I want to train multiple models on all GPUs at the same time. Let's say one model will take up one GPU. Right now, I am doing this manually. Whenever a GPU is idle, I start a training model on that GPU by myself using the Lightning CLI interface.
Since there are many models to train, I am wondering if it is possible to start a "master" process, which automatically starts a training job whenever a GPU is not occupied. Is this possible with lightning, or is this not something that lightning is intended to handle?
Beta Was this translation helpful? Give feedback.
All reactions