Memory, tokens per sec, and MFU behavior in train_gpt2cu #503
chinthysl
started this conversation in
Show and tell
Replies: 1 comment
-
Very cool thank you for posting! I am really eager to also get a multi-node setup sometime soon for myself to run similar things. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Following results are acquired to get a better understanding of training in a single node before expanding into multiple nodes.
I generated the results in a single DGX H100 node using all 8 GPUs.
Total batch size is set to 524288. Varied the model from D12 to D48 and batch size from 2 until GPU memory max out.
Major insights,
Sample script I used to sweep the results,
Beta Was this translation helpful? Give feedback.
All reactions