Only save rank0 with deepspeed_stage_3 #15665

ZeyiLiao · 2022-11-13T04:17:17Z

ZeyiLiao
Nov 13, 2022

Hi,
I am using deepspeed stage 3 and find that the checkpoint only save rank_0 state_dict and optim_dict.

I am not sure that we only need to use the result from rank_0 and it would be good, OR we the ideal result should save results from all ranks and then apply the zero_to_fp32 or convert_zero_checkpoint_to_fp32_state_dict (BTW, which one should we use? i just find so many different results online about which one to use..)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Only save rank0 with deepspeed_stage_3 #15665

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Only save rank0 with deepspeed_stage_3 #15665

Uh oh!

ZeyiLiao Nov 13, 2022

Replies: 0 comments

ZeyiLiao
Nov 13, 2022