Skip to content

convert.py: --pad-vocab not working with SPM, 'SentencePieceVocab' object has no attribute 'added_tokens_dict'. Did you mean: 'added_tokens_list'? #4958

Closed
@TheBloke

Description

@TheBloke

Hi guys

I've just noticed that since the recent convert.py refactor, the new --pad-vocab feature does not work with SPM vocabs. It does work as expected with HFFT. EDIT: actually there might be a different bug with HFFT, see next post on that.

Example command, converting model: https://huggingface.co/TigerResearch/tigerbot-13b-chat-v5

python3 ./convert.py /workspace/process/tigerresearch_tigerbot-13b-chat-v5/source --outtype f16 --outfile /workspace/process/tigerresearch_tigerbot-13b-chat-v5/gguf/tigerbot-13b-chat-v5.fp16.gguf --pad-vocab

Error message:

Writing /workspace/process/tigerresearch_tigerbot-13b-chat-v5/gguf/tigerbot-13b-chat-v5.fp16.gguf, format 1
Padding vocab with 2 token(s) - <dummy00001> through <dummy00002>
Traceback (most recent call last):
  File "/workspace/git/llama.cpp/./convert.py", line 1658, in <module>
    main(sys.argv[1:])  # Exclude the first element (script name) from sys.argv
    ^^^^^^^^^^^^^^^^^^
  File "/workspace/git/llama.cpp/./convert.py", line 1643, in main
    OutputFile.write_all(
  File "/workspace/git/llama.cpp/./convert.py", line 1188, in write_all
    check_vocab_size(params, vocab, pad_vocab=pad_vocab)
  File "/workspace/git/llama.cpp/./convert.py", line 1008, in check_vocab_size
    vocab.added_tokens_dict[f"<dummy{i:05}>"] = -1
    ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'SentencePieceVocab' object has no attribute 'added_tokens_dict'. Did you mean: 'added_tokens_list'?

In this example, I did the conversion with --vocab-type hfft instead which worked OK.

Thanks in advance for looking at this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions