Training fails on small vocabulary (V<8192) #778
Unanswered
austinleedavis
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem:
When training GPT2 with a vocab <8192 (=128*64), the process freezes upon entering
fused_classifier_kernel5
at the start of training.Goal:
I'm trying to pre-train a GPT2 with a custom tokenizer (vocab_size=72) on a custom dataset.
Tests:
I'm running on a single RTX3060 Mobile GPU. I have no problem tokenizing the dataset. However, I encounter the problem stated above when running the code. I've tested combinations of
vocab_size
(V
) andpadded_vocab_size
(PV
) starting from the defaultsV=50257; PV=50304
, then subtracted multiples of 128 until I could go no lower, stopping at V=VP=8192. Any combination forV
andVP
less than this threshold causes the process to freeze.System Specs:
Host OS: Ubuntu
Processor: AMD Ryzen 9 5900HS
CUDA Version: 12.6
Diplay Driver Version: 560.35.03
GPU: NVIDIA GeForce RTX 3060 Laptop GPU
Beta Was this translation helpful? Give feedback.
All reactions