llama_get_kv_cache_token_count() deprecation hurts debugging - suggested API enhancement

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [X] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Feature Description

I would love to have more insight in the state of the kv cache. I see the following deprecation in the llama.h:

```c
    // Returns the number of tokens in the KV cache
    LLAMA_API DEPRECATED(int llama_get_kv_cache_token_count(const struct llama_context * ctx),
            "avoid using this, it will be removed in the future, "
            "instead - count the tokens in user code");
```
I understand that the API of this function is insufficient with the new sequence IDs. But telling the developer that they shall not make any errors and "just count right" is kind of mocking them :-)

This is what I propose, most for debugging purposes:

- `llama_get_kv_seq_token_count(ctx, seq_id)`
- `llama_get_kv_seq_token_count_in_range(ctx, seq_id, p0, p1)` find the number of tokens in the given range `[p0, p1)`.
- `llama_get_kv_seq_min_pos(ctx, seq_id)` find the minimum position of that sequence
- `llama_get_kv_seq_max_pos(ctx, seq_id)` find the maximum position of that sequence

I know that llama_get_kv_seq_min_pos() and llama_get_kv_seq_max_pos() would not detect holes in the sequence, but it would still be immensely useful for debugging. For detecting holes `llama_get_kv_seq_token_count_in_range()` would be useful. Even better would of course be a function to get all the positions of a sequence, but that would be more cumbersome to solve as C API and more complicated to implement.

And change the return value of following functions:
- `int32_t llama_kv_cache_tokens_rm(...)` to return the number of deleted tokens.
- `int32_t llama_kv_cache_seq_rm(...)` to return the number of deleted tokens.
- `int32_t llama_kv_cache_seq_cp(...)` to return the number of copied tokens.
- `int32_t llama_kv_cache_seq_shift(...)` to return the number of shifted tokens.

# Motivation

I as developer can make errors. Especially when I try to understand an unfamiliar API. Not being able to check if the state of the kv cache is what I expect it to be, is seriously limiting. 

# Possible Implementation

- `llama_get_kv_seq_token_count(ctx, seq_id)` loop over the cells and count the occupied positions with cells with that sequence ID.
- `llama_get_kv_seq_token_count_in_range(ctx, seq_id, p0, p1)` same as above, but only look at positions in the given range.
- `llama_get_kv_seq_min_pos` loop over the cells and find the minimum positions of cells with that sequence ID.
- `llama_get_kv_seq_max_pos` loop over the cells and find the maximum positions of cells with that sequence ID.
- Add a counter in the functions `llama_kv_cache_tokens_rm`, `llama_kv_cache_seq_rm`, `llama_kv_cache_seq_cp` and `llama_kv_cache_seq_shift`. And return the value of that counter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama_get_kv_cache_token_count() deprecation hurts debugging - suggested API enhancement #4035

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

llama_get_kv_cache_token_count() deprecation hurts debugging - suggested API enhancement #4035

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions