Fabric, How to run lig-gpt/finetune/lora.py on multiple nodes #18404
Unanswered
Andcircle
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The code is here https://github.com/Lightning-AI/lit-gpt/blob/main/finetune/lora.py
the environment is: 2 nodes x 8 A100 80gb
I did:
python finetune/lora.py on both master and worker node
lightning run model finetune/lora.py --strategy=fsdp --devices=8 --num-nodes=2 --accelerator=cuda --precision="bf16" --main-address $MASTER_ADDR --main-port=$MASTER_PORT on both master and worker node
Always get following error:
(when I use lightning run start, should I change the code in finetune/lora.py ?)
Beta Was this translation helpful? Give feedback.
All reactions