Low Mixed Precision Performance

I am encountering some strange performance behavior on the A770. For example, taking the CIFAR-10 example in the [documentation](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/examples.html).

Using FP32, I get around 5.75s per epoch and, using BF16, I get around 6.2s per epoch. I also get the same exact performance with and without `ipex.optimize()`.

Also, when I compare the performance with a Tesla T4 on Colab, in FP32, it runs each epoch in around 1s and, for FP16, around 0.25s. Wayy faster and the A770 has technically better specs...

Are the XMX engines being used on Arc GPUs? Yes #258? 

Dunno, if it might be related, but I get the following warnings when running the example (**EDIT** I started going through the code in the repo, those warnings are not related to the current issue):
```
[/home/fred/.local/lib/python3.10/site-packages/intel_extension_for_pytorch/frontend.py:447](https://vscode-remote+ssh-002dremote-002b192-002e168-002e0-002e124.vscode-resource.vscode-cdn.net/home/fred/.local/lib/python3.10/site-packages/intel_extension_for_pytorch/frontend.py:447): UserWarning: For XPU device, the split master weight is unsupported for now, so temp to disable it
  warnings.warn("For XPU device, the split master weight is unsupported for now, so temp to disable it")
[/home/fred/.local/lib/python3.10/site-packages/intel_extension_for_pytorch/frontend.py:457](https://vscode-remote+ssh-002dremote-002b192-002e168-002e0-002e124.vscode-resource.vscode-cdn.net/home/fred/.local/lib/python3.10/site-packages/intel_extension_for_pytorch/frontend.py:457): UserWarning: For XPU device to save valuable device memory, temp to do optimization on inplaced model, so                     make inplace to be true
  warnings.warn(
[/home/fred/.local/lib/python3.10/site-packages/intel_extension_for_pytorch/frontend.py:464](https://vscode-remote+ssh-002dremote-002b192-002e168-002e0-002e124.vscode-resource.vscode-cdn.net/home/fred/.local/lib/python3.10/site-packages/intel_extension_for_pytorch/frontend.py:464): UserWarning: For XPU, the weight prepack and sample input are disabled. The onednn layout                     is automatically chosen to use
  warnings.warn(
```

Ubuntu 22.04 with 1.13.10+xpu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Low Mixed Precision Performance #296

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Low Mixed Precision Performance #296

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions