-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[X86][SelectionDAG] - Add support for llvm.canonicalize intrinsic #106370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 14 commits
a824ded
5ebc6b4
34d5244
3c961a8
74ae03e
d405230
96f7c43
317dd6f
06f09f4
cbe7d0b
d48773a
9e37e86
26ee8a9
b9d2cf8
7a77677
9970720
a71759d
fa04409
ad86002
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2559,6 +2559,7 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM, | |
ISD::STRICT_FMA, | ||
ISD::FMINNUM, | ||
ISD::FMAXNUM, | ||
ISD::FCANONICALIZE, | ||
ISD::SUB, | ||
ISD::LOAD, | ||
ISD::LRINT, | ||
|
@@ -58159,6 +58160,25 @@ static SDValue combineINTRINSIC_VOID(SDNode *N, SelectionDAG &DAG, | |
return SDValue(); | ||
} | ||
|
||
static SDValue combineCanonicalize(SDNode *N, SelectionDAG &DAG) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We prefer to use |
||
SDValue Operand = N->getOperand(0); | ||
EVT VT = Operand.getValueType(); | ||
SDLoc dl(N); | ||
|
||
// Canonicalize scalar variable FP Nodes. | ||
SDValue One = | ||
DAG.getNode(ISD::SINT_TO_FP, dl, VT, DAG.getConstant(1, dl, MVT::i32)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you change MVT::i32 to VT.changeTypeToInteger() I think this should work for vectors as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should I handle it in a following PR? Or you recommend I do it now? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this patch might be simpler thanks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, will do it here. Thanks for the suggestion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried this suggestion, But I'm running into a crash for f80 scalar input, What I realized while debugging though is that changeTypeToInteger may not be required, I did following changes and I see that vector inputs are handled pretty seamlessly, Change
Running via gdb, I get a BUILD_VECTOR as such.
input
result
input
result
input
result
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not just emit a regular getConstantFP instead of emitting this as an integer cast? This can also just go as the generic lowering implementation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This also is lowering, it is not combine. It should not be invoked through PerformDAGCombine There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this correct place ( conditions under which setOperationAction is placed ) /method ( legal or custom or promote ?) of handling data types?
|
||
// TODO: Fix Crash for bf16 when generating strict_fmul as it | ||
// leads to a error : SoftPromoteHalfResult #0: t11: bf16,ch = strict_fmul t0, | ||
// ConstantFP:bf16<APFloat(16256)>, t5 LLVM ERROR: Do not know how to soft | ||
// promote this operator's result! | ||
SDValue Chain = DAG.getEntryNode(); | ||
SDValue StrictFmul = DAG.getNode(ISD::STRICT_FMUL, dl, {VT, MVT::Other}, | ||
{Chain, One, Operand}); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Constant operands canonically should be the RHS There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
return StrictFmul; | ||
// TODO : Hanlde vectors. | ||
} | ||
|
||
SDValue X86TargetLowering::PerformDAGCombine(SDNode *N, | ||
DAGCombinerInfo &DCI) const { | ||
SelectionDAG &DAG = DCI.DAG; | ||
|
@@ -58198,6 +58218,7 @@ SDValue X86TargetLowering::PerformDAGCombine(SDNode *N, | |
case ISD::AND: return combineAnd(N, DAG, DCI, Subtarget); | ||
case ISD::OR: return combineOr(N, DAG, DCI, Subtarget); | ||
case ISD::XOR: return combineXor(N, DAG, DCI, Subtarget); | ||
case ISD::FCANONICALIZE: return combineCanonicalize(N, DAG); | ||
pawan-nirpal-031 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
case ISD::BITREVERSE: return combineBITREVERSE(N, DAG, DCI, Subtarget); | ||
case ISD::AVGCEILS: | ||
case ISD::AVGCEILU: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,273 @@ | ||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --default-march x86_64-unknown-linux-gnu --version 5 | ||
; RUN: llc -mattr=+sse2 -mtriple=x86_64 < %s | FileCheck %s -check-prefixes=SSE | ||
; RUN: llc -mattr=+avx -mtriple=x86_64 < %s | FileCheck %s -check-prefixes=AVX1 | ||
; RUN: llc -mattr=+avx2 -mtriple=x86_64 < %s | FileCheck %s -check-prefixes=AVX2 | ||
; RUN: llc -mattr=+avx512f -mtriple=x86_64 < %s | FileCheck %s -check-prefixes=AVX512F | ||
; RUN: llc -mattr=+avx512bw -mtriple=x86_64 < %s | FileCheck %s -check-prefixes=AVX512BW | ||
|
||
define void @v_test_canonicalize__half(half addrspace(1)* %out) nounwind { | ||
; SSE-LABEL: v_test_canonicalize__half: | ||
; SSE: # %bb.0: # %entry | ||
; SSE-NEXT: pushq %rbx | ||
; SSE-NEXT: subq $16, %rsp | ||
; SSE-NEXT: movq %rdi, %rbx | ||
; SSE-NEXT: pinsrw $0, (%rdi), %xmm0 | ||
; SSE-NEXT: callq __extendhfsf2@PLT | ||
; SSE-NEXT: movd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Folded Spill | ||
; SSE-NEXT: pinsrw $0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 | ||
; SSE-NEXT: callq __extendhfsf2@PLT | ||
; SSE-NEXT: mulss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Folded Reload | ||
; SSE-NEXT: callq __truncsfhf2@PLT | ||
; SSE-NEXT: pextrw $0, %xmm0, %eax | ||
; SSE-NEXT: movw %ax, (%rbx) | ||
; SSE-NEXT: addq $16, %rsp | ||
; SSE-NEXT: popq %rbx | ||
; SSE-NEXT: retq | ||
; | ||
; AVX1-LABEL: v_test_canonicalize__half: | ||
; AVX1: # %bb.0: # %entry | ||
; AVX1-NEXT: pushq %rbx | ||
; AVX1-NEXT: subq $16, %rsp | ||
; AVX1-NEXT: movq %rdi, %rbx | ||
; AVX1-NEXT: vpinsrw $0, (%rdi), %xmm0, %xmm0 | ||
; AVX1-NEXT: callq __extendhfsf2@PLT | ||
; AVX1-NEXT: vmovd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Folded Spill | ||
; AVX1-NEXT: vpinsrw $0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0 | ||
; AVX1-NEXT: callq __extendhfsf2@PLT | ||
; AVX1-NEXT: vmulss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 4-byte Folded Reload | ||
; AVX1-NEXT: callq __truncsfhf2@PLT | ||
; AVX1-NEXT: vpextrw $0, %xmm0, (%rbx) | ||
; AVX1-NEXT: addq $16, %rsp | ||
; AVX1-NEXT: popq %rbx | ||
; AVX1-NEXT: retq | ||
; | ||
; AVX2-LABEL: v_test_canonicalize__half: | ||
; AVX2: # %bb.0: # %entry | ||
; AVX2-NEXT: pushq %rbx | ||
; AVX2-NEXT: subq $16, %rsp | ||
; AVX2-NEXT: movq %rdi, %rbx | ||
; AVX2-NEXT: vpinsrw $0, (%rdi), %xmm0, %xmm0 | ||
; AVX2-NEXT: callq __extendhfsf2@PLT | ||
; AVX2-NEXT: vmovd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Folded Spill | ||
; AVX2-NEXT: vpinsrw $0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0 | ||
; AVX2-NEXT: callq __extendhfsf2@PLT | ||
; AVX2-NEXT: vmulss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 4-byte Folded Reload | ||
; AVX2-NEXT: callq __truncsfhf2@PLT | ||
; AVX2-NEXT: vpextrw $0, %xmm0, (%rbx) | ||
; AVX2-NEXT: addq $16, %rsp | ||
; AVX2-NEXT: popq %rbx | ||
; AVX2-NEXT: retq | ||
; | ||
; AVX512F-LABEL: v_test_canonicalize__half: | ||
; AVX512F: # %bb.0: # %entry | ||
; AVX512F-NEXT: movzwl (%rdi), %eax | ||
; AVX512F-NEXT: movzwl {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ecx | ||
; AVX512F-NEXT: vmovd %ecx, %xmm0 | ||
; AVX512F-NEXT: vcvtph2ps %xmm0, %xmm0 | ||
; AVX512F-NEXT: vmovd %eax, %xmm1 | ||
; AVX512F-NEXT: vcvtph2ps %xmm1, %xmm1 | ||
; AVX512F-NEXT: vmulss %xmm1, %xmm0, %xmm0 | ||
; AVX512F-NEXT: vxorps %xmm1, %xmm1, %xmm1 | ||
; AVX512F-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3] | ||
; AVX512F-NEXT: vcvtps2ph $4, %xmm0, %xmm0 | ||
; AVX512F-NEXT: vmovd %xmm0, %eax | ||
; AVX512F-NEXT: movw %ax, (%rdi) | ||
; AVX512F-NEXT: retq | ||
; | ||
; AVX512BW-LABEL: v_test_canonicalize__half: | ||
; AVX512BW: # %bb.0: # %entry | ||
; AVX512BW-NEXT: movzwl (%rdi), %eax | ||
; AVX512BW-NEXT: movzwl {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ecx | ||
; AVX512BW-NEXT: vmovd %ecx, %xmm0 | ||
; AVX512BW-NEXT: vcvtph2ps %xmm0, %xmm0 | ||
; AVX512BW-NEXT: vmovd %eax, %xmm1 | ||
; AVX512BW-NEXT: vcvtph2ps %xmm1, %xmm1 | ||
; AVX512BW-NEXT: vmulss %xmm1, %xmm0, %xmm0 | ||
; AVX512BW-NEXT: vxorps %xmm1, %xmm1, %xmm1 | ||
; AVX512BW-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3] | ||
; AVX512BW-NEXT: vcvtps2ph $4, %xmm0, %xmm0 | ||
; AVX512BW-NEXT: vmovd %xmm0, %eax | ||
; AVX512BW-NEXT: movw %ax, (%rdi) | ||
; AVX512BW-NEXT: retq | ||
entry: | ||
%val = load half, half addrspace(1)* %out | ||
%canonicalized = call half @llvm.canonicalize.f16(half %val) | ||
store half %canonicalized, half addrspace(1)* %out | ||
ret void | ||
} | ||
|
||
define half @complex_canonicalize_fmul_half(half %a, half %b) nounwind { | ||
; SSE-LABEL: complex_canonicalize_fmul_half: | ||
; SSE: # %bb.0: # %entry | ||
; SSE-NEXT: pushq %rax | ||
; SSE-NEXT: movss %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill | ||
; SSE-NEXT: callq __extendhfsf2@PLT | ||
; SSE-NEXT: movss %xmm0, (%rsp) # 4-byte Spill | ||
; SSE-NEXT: movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload | ||
; SSE-NEXT: # xmm0 = mem[0],zero,zero,zero | ||
; SSE-NEXT: callq __extendhfsf2@PLT | ||
; SSE-NEXT: movss %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill | ||
; SSE-NEXT: movss (%rsp), %xmm1 # 4-byte Reload | ||
; SSE-NEXT: # xmm1 = mem[0],zero,zero,zero | ||
; SSE-NEXT: subss %xmm0, %xmm1 | ||
; SSE-NEXT: movaps %xmm1, %xmm0 | ||
; SSE-NEXT: callq __truncsfhf2@PLT | ||
; SSE-NEXT: callq __extendhfsf2@PLT | ||
; SSE-NEXT: movss %xmm0, (%rsp) # 4-byte Spill | ||
; SSE-NEXT: addss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Folded Reload | ||
; SSE-NEXT: callq __truncsfhf2@PLT | ||
; SSE-NEXT: callq __extendhfsf2@PLT | ||
; SSE-NEXT: subss (%rsp), %xmm0 # 4-byte Folded Reload | ||
; SSE-NEXT: callq __truncsfhf2@PLT | ||
; SSE-NEXT: callq __extendhfsf2@PLT | ||
; SSE-NEXT: movss %xmm0, (%rsp) # 4-byte Spill | ||
; SSE-NEXT: pinsrw $0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 | ||
; SSE-NEXT: callq __extendhfsf2@PLT | ||
; SSE-NEXT: mulss (%rsp), %xmm0 # 4-byte Folded Reload | ||
; SSE-NEXT: callq __truncsfhf2@PLT | ||
; SSE-NEXT: callq __extendhfsf2@PLT | ||
; SSE-NEXT: subss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Folded Reload | ||
; SSE-NEXT: callq __truncsfhf2@PLT | ||
; SSE-NEXT: popq %rax | ||
; SSE-NEXT: retq | ||
; | ||
; AVX1-LABEL: complex_canonicalize_fmul_half: | ||
; AVX1: # %bb.0: # %entry | ||
; AVX1-NEXT: pushq %rax | ||
; AVX1-NEXT: vmovss %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill | ||
; AVX1-NEXT: callq __extendhfsf2@PLT | ||
; AVX1-NEXT: vmovss %xmm0, (%rsp) # 4-byte Spill | ||
; AVX1-NEXT: vmovss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload | ||
; AVX1-NEXT: # xmm0 = mem[0],zero,zero,zero | ||
; AVX1-NEXT: callq __extendhfsf2@PLT | ||
; AVX1-NEXT: vmovss %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill | ||
; AVX1-NEXT: vmovss (%rsp), %xmm1 # 4-byte Reload | ||
; AVX1-NEXT: # xmm1 = mem[0],zero,zero,zero | ||
; AVX1-NEXT: vsubss %xmm0, %xmm1, %xmm0 | ||
; AVX1-NEXT: callq __truncsfhf2@PLT | ||
; AVX1-NEXT: callq __extendhfsf2@PLT | ||
; AVX1-NEXT: vmovss %xmm0, (%rsp) # 4-byte Spill | ||
; AVX1-NEXT: vaddss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 4-byte Folded Reload | ||
; AVX1-NEXT: callq __truncsfhf2@PLT | ||
; AVX1-NEXT: callq __extendhfsf2@PLT | ||
; AVX1-NEXT: vsubss (%rsp), %xmm0, %xmm0 # 4-byte Folded Reload | ||
; AVX1-NEXT: callq __truncsfhf2@PLT | ||
; AVX1-NEXT: callq __extendhfsf2@PLT | ||
; AVX1-NEXT: vmovss %xmm0, (%rsp) # 4-byte Spill | ||
; AVX1-NEXT: vpinsrw $0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0 | ||
; AVX1-NEXT: callq __extendhfsf2@PLT | ||
; AVX1-NEXT: vmulss (%rsp), %xmm0, %xmm0 # 4-byte Folded Reload | ||
; AVX1-NEXT: callq __truncsfhf2@PLT | ||
; AVX1-NEXT: callq __extendhfsf2@PLT | ||
; AVX1-NEXT: vsubss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 4-byte Folded Reload | ||
; AVX1-NEXT: callq __truncsfhf2@PLT | ||
; AVX1-NEXT: popq %rax | ||
; AVX1-NEXT: retq | ||
; | ||
; AVX2-LABEL: complex_canonicalize_fmul_half: | ||
; AVX2: # %bb.0: # %entry | ||
; AVX2-NEXT: pushq %rax | ||
; AVX2-NEXT: vmovss %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill | ||
; AVX2-NEXT: callq __extendhfsf2@PLT | ||
; AVX2-NEXT: vmovss %xmm0, (%rsp) # 4-byte Spill | ||
; AVX2-NEXT: vmovss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload | ||
; AVX2-NEXT: # xmm0 = mem[0],zero,zero,zero | ||
; AVX2-NEXT: callq __extendhfsf2@PLT | ||
; AVX2-NEXT: vmovss %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill | ||
; AVX2-NEXT: vmovss (%rsp), %xmm1 # 4-byte Reload | ||
; AVX2-NEXT: # xmm1 = mem[0],zero,zero,zero | ||
; AVX2-NEXT: vsubss %xmm0, %xmm1, %xmm0 | ||
; AVX2-NEXT: callq __truncsfhf2@PLT | ||
; AVX2-NEXT: callq __extendhfsf2@PLT | ||
; AVX2-NEXT: vmovss %xmm0, (%rsp) # 4-byte Spill | ||
; AVX2-NEXT: vaddss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 4-byte Folded Reload | ||
; AVX2-NEXT: callq __truncsfhf2@PLT | ||
; AVX2-NEXT: callq __extendhfsf2@PLT | ||
; AVX2-NEXT: vsubss (%rsp), %xmm0, %xmm0 # 4-byte Folded Reload | ||
; AVX2-NEXT: callq __truncsfhf2@PLT | ||
; AVX2-NEXT: callq __extendhfsf2@PLT | ||
; AVX2-NEXT: vmovss %xmm0, (%rsp) # 4-byte Spill | ||
; AVX2-NEXT: vpinsrw $0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0 | ||
; AVX2-NEXT: callq __extendhfsf2@PLT | ||
; AVX2-NEXT: vmulss (%rsp), %xmm0, %xmm0 # 4-byte Folded Reload | ||
; AVX2-NEXT: callq __truncsfhf2@PLT | ||
; AVX2-NEXT: callq __extendhfsf2@PLT | ||
; AVX2-NEXT: vsubss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 4-byte Folded Reload | ||
; AVX2-NEXT: callq __truncsfhf2@PLT | ||
; AVX2-NEXT: popq %rax | ||
; AVX2-NEXT: retq | ||
; | ||
; AVX512F-LABEL: complex_canonicalize_fmul_half: | ||
; AVX512F: # %bb.0: # %entry | ||
; AVX512F-NEXT: vpextrw $0, %xmm1, %eax | ||
; AVX512F-NEXT: vpextrw $0, %xmm0, %ecx | ||
; AVX512F-NEXT: vmovd %ecx, %xmm0 | ||
; AVX512F-NEXT: vcvtph2ps %xmm0, %xmm0 | ||
; AVX512F-NEXT: vmovd %eax, %xmm1 | ||
; AVX512F-NEXT: vcvtph2ps %xmm1, %xmm1 | ||
; AVX512F-NEXT: vsubss %xmm1, %xmm0, %xmm0 | ||
; AVX512F-NEXT: vcvtps2ph $4, %xmm0, %xmm0 | ||
; AVX512F-NEXT: vcvtph2ps %xmm0, %xmm0 | ||
; AVX512F-NEXT: vaddss %xmm1, %xmm0, %xmm2 | ||
; AVX512F-NEXT: vcvtps2ph $4, %xmm2, %xmm2 | ||
; AVX512F-NEXT: vcvtph2ps %xmm2, %xmm2 | ||
; AVX512F-NEXT: vsubss %xmm0, %xmm2, %xmm0 | ||
; AVX512F-NEXT: vcvtps2ph $4, %xmm0, %xmm0 | ||
; AVX512F-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero | ||
; AVX512F-NEXT: vcvtph2ps %xmm0, %xmm0 | ||
; AVX512F-NEXT: movzwl {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %eax | ||
; AVX512F-NEXT: vmovd %eax, %xmm2 | ||
; AVX512F-NEXT: vcvtph2ps %xmm2, %xmm2 | ||
; AVX512F-NEXT: vmulss %xmm0, %xmm2, %xmm0 | ||
; AVX512F-NEXT: vxorps %xmm2, %xmm2, %xmm2 | ||
; AVX512F-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0],xmm2[1,2,3] | ||
; AVX512F-NEXT: vcvtps2ph $4, %xmm0, %xmm0 | ||
; AVX512F-NEXT: vcvtph2ps %xmm0, %xmm0 | ||
; AVX512F-NEXT: vsubss %xmm1, %xmm0, %xmm0 | ||
; AVX512F-NEXT: vcvtps2ph $4, %xmm0, %xmm0 | ||
; AVX512F-NEXT: vmovd %xmm0, %eax | ||
; AVX512F-NEXT: vpinsrw $0, %eax, %xmm0, %xmm0 | ||
; AVX512F-NEXT: retq | ||
; | ||
; AVX512BW-LABEL: complex_canonicalize_fmul_half: | ||
; AVX512BW: # %bb.0: # %entry | ||
; AVX512BW-NEXT: vpextrw $0, %xmm1, %eax | ||
; AVX512BW-NEXT: vpextrw $0, %xmm0, %ecx | ||
; AVX512BW-NEXT: vmovd %ecx, %xmm0 | ||
; AVX512BW-NEXT: vcvtph2ps %xmm0, %xmm0 | ||
; AVX512BW-NEXT: vmovd %eax, %xmm1 | ||
; AVX512BW-NEXT: vcvtph2ps %xmm1, %xmm1 | ||
; AVX512BW-NEXT: vsubss %xmm1, %xmm0, %xmm0 | ||
; AVX512BW-NEXT: vcvtps2ph $4, %xmm0, %xmm0 | ||
; AVX512BW-NEXT: vcvtph2ps %xmm0, %xmm0 | ||
; AVX512BW-NEXT: vaddss %xmm1, %xmm0, %xmm2 | ||
; AVX512BW-NEXT: vcvtps2ph $4, %xmm2, %xmm2 | ||
; AVX512BW-NEXT: vcvtph2ps %xmm2, %xmm2 | ||
; AVX512BW-NEXT: vsubss %xmm0, %xmm2, %xmm0 | ||
; AVX512BW-NEXT: vcvtps2ph $4, %xmm0, %xmm0 | ||
; AVX512BW-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero | ||
; AVX512BW-NEXT: vcvtph2ps %xmm0, %xmm0 | ||
; AVX512BW-NEXT: movzwl {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %eax | ||
; AVX512BW-NEXT: vmovd %eax, %xmm2 | ||
; AVX512BW-NEXT: vcvtph2ps %xmm2, %xmm2 | ||
; AVX512BW-NEXT: vmulss %xmm0, %xmm2, %xmm0 | ||
; AVX512BW-NEXT: vxorps %xmm2, %xmm2, %xmm2 | ||
; AVX512BW-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0],xmm2[1,2,3] | ||
; AVX512BW-NEXT: vcvtps2ph $4, %xmm0, %xmm0 | ||
; AVX512BW-NEXT: vcvtph2ps %xmm0, %xmm0 | ||
; AVX512BW-NEXT: vsubss %xmm1, %xmm0, %xmm0 | ||
; AVX512BW-NEXT: vcvtps2ph $4, %xmm0, %xmm0 | ||
; AVX512BW-NEXT: vmovd %xmm0, %eax | ||
; AVX512BW-NEXT: vpinsrw $0, %eax, %xmm0, %xmm0 | ||
; AVX512BW-NEXT: retq | ||
entry: | ||
|
||
%mul1 = fsub half %a, %b | ||
%add = fadd half %mul1, %b | ||
%mul2 = fsub half %add, %mul1 | ||
%canonicalized = call half @llvm.canonicalize.f16(half %mul2) | ||
%result = fsub half %canonicalized, %b | ||
ret half %result | ||
} | ||
|
||
declare half @llvm.canonicalize.f16(half) |
Uh oh!
There was an error while loading. Please reload this page.