Closed
Description
Hi guys
I've just noticed that since the recent convert.py
refactor, the new --pad-vocab
feature does not work with SPM vocabs. It does work as expected with HFFT. EDIT: actually there might be a different bug with HFFT, see next post on that.
Example command, converting model: https://huggingface.co/TigerResearch/tigerbot-13b-chat-v5
python3 ./convert.py /workspace/process/tigerresearch_tigerbot-13b-chat-v5/source --outtype f16 --outfile /workspace/process/tigerresearch_tigerbot-13b-chat-v5/gguf/tigerbot-13b-chat-v5.fp16.gguf --pad-vocab
Error message:
Writing /workspace/process/tigerresearch_tigerbot-13b-chat-v5/gguf/tigerbot-13b-chat-v5.fp16.gguf, format 1
Padding vocab with 2 token(s) - <dummy00001> through <dummy00002>
Traceback (most recent call last):
File "/workspace/git/llama.cpp/./convert.py", line 1658, in <module>
main(sys.argv[1:]) # Exclude the first element (script name) from sys.argv
^^^^^^^^^^^^^^^^^^
File "/workspace/git/llama.cpp/./convert.py", line 1643, in main
OutputFile.write_all(
File "/workspace/git/llama.cpp/./convert.py", line 1188, in write_all
check_vocab_size(params, vocab, pad_vocab=pad_vocab)
File "/workspace/git/llama.cpp/./convert.py", line 1008, in check_vocab_size
vocab.added_tokens_dict[f"<dummy{i:05}>"] = -1
^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'SentencePieceVocab' object has no attribute 'added_tokens_dict'. Did you mean: 'added_tokens_list'?
In this example, I did the conversion with --vocab-type hfft
instead which worked OK.
Thanks in advance for looking at this.
Metadata
Metadata
Assignees
Labels
No labels