`convert.py`: --pad-vocab not working with SPM, `'SentencePieceVocab' object has no attribute 'added_tokens_dict'. Did you mean: 'added_tokens_list'?`

Hi guys

I've just noticed that since the recent `convert.py` refactor, the new `--pad-vocab` feature does not work with SPM vocabs.  It does work as expected with HFFT.  *EDIT: actually there might be a different bug with HFFT, see next post on that.*

Example command, converting model: https://huggingface.co/TigerResearch/tigerbot-13b-chat-v5
```
python3 ./convert.py /workspace/process/tigerresearch_tigerbot-13b-chat-v5/source --outtype f16 --outfile /workspace/process/tigerresearch_tigerbot-13b-chat-v5/gguf/tigerbot-13b-chat-v5.fp16.gguf --pad-vocab
```

Error message:
```
Writing /workspace/process/tigerresearch_tigerbot-13b-chat-v5/gguf/tigerbot-13b-chat-v5.fp16.gguf, format 1
Padding vocab with 2 token(s) - <dummy00001> through <dummy00002>
Traceback (most recent call last):
  File "/workspace/git/llama.cpp/./convert.py", line 1658, in <module>
    main(sys.argv[1:])  # Exclude the first element (script name) from sys.argv
    ^^^^^^^^^^^^^^^^^^
  File "/workspace/git/llama.cpp/./convert.py", line 1643, in main
    OutputFile.write_all(
  File "/workspace/git/llama.cpp/./convert.py", line 1188, in write_all
    check_vocab_size(params, vocab, pad_vocab=pad_vocab)
  File "/workspace/git/llama.cpp/./convert.py", line 1008, in check_vocab_size
    vocab.added_tokens_dict[f"<dummy{i:05}>"] = -1
    ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'SentencePieceVocab' object has no attribute 'added_tokens_dict'. Did you mean: 'added_tokens_list'?
```

In this example, I did the conversion with `--vocab-type hfft` instead which worked OK.

Thanks in advance for looking at this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`convert.py`: --pad-vocab not working with SPM, `'SentencePieceVocab' object has no attribute 'added_tokens_dict'. Did you mean: 'added_tokens_list'?` #4958

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

convert.py: --pad-vocab not working with SPM, 'SentencePieceVocab' object has no attribute 'added_tokens_dict'. Did you mean: 'added_tokens_list'? #4958

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`convert.py`: --pad-vocab not working with SPM, `'SentencePieceVocab' object has no attribute 'added_tokens_dict'. Did you mean: 'added_tokens_list'?` #4958