-
Couldn't load subscription status.
- Fork 64
Description
Summary
In https://github.com/intel/sycl-tla/blob/main/media/docs/cpp/xe_rearchitecture.md#subgroup-scope-and-thread-local-data, VNNI layout from subgroup view for a 4x16 block has been described as:
However, Intel Optimization Guides define VNNI layout like this (although this is for AMX, it should be similar for XMX):
More details regarding the clarification requested
If we assume that we have a 4x16 array whose values are representative of linear indices in a contiguous order (i.e. *(a.data_ptr() + x) = x, for x between 0 and 63), then with the following code, we see a different visualization than in the documentation for VNNI layout:
import torch
a = torch.arange(64).view(4, 16).contiguous()
print("plain format")
print(a)
c = torch.zeros(2, 16, 2).to(torch.long)
for i in range(4):
for j in range(16):
c[i // 2][j][i % 2] = a[i][j]
print("VNNI format")
print(c.view(4, 16))
Please clarify what the layout visualization in the documentation corresponds to, as their spatial arrangement (subgroup view) doesn't seem to correspond to this layout.
It also seems to be different from the SPIRV documentation on 2D loads.
Thanks!