-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Updating xarch to utilize EVEX compares and blending where profitable #116983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
09f3a34 to
b577b0c
Compare
b577b0c to
223fc9c
Compare
b1a51d9 to
d6799e5
Compare
a01a4f5 to
03653bf
Compare
|
CC. @dotnet/jit-contrib. This should be ready for review. I could split this up into 2 PRs ( This is one of the last major milestones for the embedded masking support and helps ensure that all vector sizes are getting the expected implicit lightup. |
|
The size regressions that are showing are primarily from cases where we decide to fallback to the non-kmask variant and it is comparing against In other words, the few regressions are namely due to #70182. We might be able to mitigate some of that by finding an existing CSE with a good VN in range or doing some other backtrack searching tricks, like we've done as workarounds for the issue; but I don't think we should block this PR on that; particularly since many important cases are improved and the |
|
/azp run runtime-coreclr jitstress-isas-x86, Fuzzlyn, Antigen, runtime-coreclr jitstress, runtime-coreclr jitstressregs |
|
Azure Pipelines successfully started running 5 pipeline(s). |
|
/azp run runtime-coreclr jitstress-isas-x86, Fuzzlyn, Antigen, runtime-coreclr jitstress, runtime-coreclr jitstressregs |
|
Azure Pipelines successfully started running 5 pipeline(s). |
|
/azp run runtime-coreclr jitstress-isas-x86, Fuzzlyn, Antigen, runtime-coreclr jitstress, runtime-coreclr jitstressregs |
|
Azure Pipelines successfully started running 5 pipeline(s). |
This updates the xarch intrinsic logic to always import nodes as TYP_MASK where supported and to lower them back to the non-mask variants if no other optimizations were allowed to kick-in. This allows better overall use of the hardware for existing intrinsic code paths.