KV Cache
#1617
Replies: 1 comment
-
@krzysz00 Ive written this. I think there would be no early workgroup exits. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Example
Short summary of notes
max_seq_len
(say 4096)n_block_idx
* gemm0NperBlock tocurrent_seq_len
For a short term solution :
current_seq_len
.n_block_idx
>current_seq_len
k_block_idx
>current_seq_len
NOTE 1 : I think above needed to be understood in transposed manner as we do : (Vt x ( Kt x Qt ))t
NOTE 2 : This is how do this w/o touching coordinate transforms.
Beta Was this translation helpful? Give feedback.
All reactions