Skip to content

[LV][EVL] Incorrect behavior of fixed-order recurrence idiom with EVL tail folding #122461

@Mel-Chen

Description

@Mel-Chen

When enabling EVL tail folding, the llvm.splice operation may encounter errors in the final iteration because the EVL in the second-to-last iteration might not equal VF * UF.
This could result in unexpected behavior, such as:

llvm.splice([A, B, C, poison], [D, E, poison, poison], -1) ==> [poison, D, E, poison]  

This issue was identified by the LLVM test-suite in SingleSource/UnitTests/Vectorizer/recurrences.test.

Checking first_order_recurrence
Checking second_order_recurrence
Checking third_order_recurrence
Miscompare

Currently, we have temporarily disabled this feature using #122458. It will be re-enabled after implementing the following fixes.

vector.ph:                                        ; preds = %for.body.preheader.i.i.i
  ...
  %max.vf.1 = tail call i32 @llvm.vscale.i32()
  %max.vf = shl nuw nsw i32 %max.vf.1, 2 
  br label %vector.body

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %evl.based.iv = phi i64 [ 0, %vector.ph ], [ %index.evl.next, %vector.body ]

 ### Record the evl of previous iteration. Initialized by VF ###
  %prev.evl = phi i32 [ %max.vf, %vector.ph ], [ %17, %vector.body ]    

  %vector.recur = phi <vscale x 4 x i32> [ %vector.recur.init, %vector.ph ], [ %vp.op.load, %vector.body ]
  %vector.recur8 = phi <vscale x 4 x i32> [ %vector.recur.init7, %vector.ph ], [ %19, %vector.body ]
  %vector.recur10 = phi <vscale x 4 x i32> [ %vector.recur.init9, %vector.ph ], [ %20, %vector.body ]
  %avl = sub i64 %wide.trip.count.i.i.i, %evl.based.iv
  %17 = tail call i32 @llvm.experimental.get.vector.length.i64(i64 %avl, i32 4, i1 true)
  %18 = getelementptr inbounds nuw i32, ptr %__args.val, i64 %evl.based.iv
  %vp.op.load = tail call <vscale x 4 x i32> @llvm.vp.load.nxv4i32.p0(ptr align 4 %18, <vscale x 4 x i1> splat (i1 true), i32 %17), !tbaa !6

### Replace llvm.splice with llvm.experimental.vp.splice. ###
  %19 = tail call <vscale x 4 x i32> @llvm.experimental.vp.splice.nxv4i32(<vscale x 4 x i32> %vector.recur, <vscale x 4 x i32> %vp.op.load, i32 -1, <vscale x 4 x i1> splat (i1 true), i32 %prev.evl, i32 %17)
  %20 = tail call <vscale x 4 x i32> @llvm.experimental.vp.splice.nxv4i32(<vscale x 4 x i32> %vector.recur8, <vscale x 4 x i32> %19, i32 -1, <vscale x 4 x i1> splat (i1 true), i32 %prev.evl, i32 %17)
  %21 = tail call <vscale x 4 x i32> @llvm.experimental.vp.splice.nxv4i32(<vscale x 4 x i32> %vector.recur10, <vscale x 4 x i32> %20, i32 -1, <vscale x 4 x i1> splat (i1 true), i32 %prev.evl, i32 %17)

  %vp.op = tail call <vscale x 4 x i32> @llvm.vp.add.nxv4i32(<vscale x 4 x i32> %20, <vscale x 4 x i32> %19, <vscale x 4 x i1> splat (i1 true), i32 %17)
  %vp.op11 = tail call <vscale x 4 x i32> @llvm.vp.add.nxv4i32(<vscale x 4 x i32> %vp.op, <vscale x 4 x i32> %21, <vscale x 4 x i1> splat (i1 true), i32 %17)
  %22 = getelementptr inbounds nuw i32, ptr %__args1.val, i64 %evl.based.iv
  tail call void @llvm.vp.store.nxv4i32.p0(<vscale x 4 x i32> %vp.op11, ptr align 4 %22, <vscale x 4 x i1> splat (i1 true), i32 %17), !tbaa !6
  %23 = zext i32 %17 to i64
  %index.evl.next = add nuw i64 %evl.based.iv, %23
  %index.next = add nuw i64 %index, %7
  %24 = icmp eq i64 %index.next, %n.vec
  br i1 %24, label %"_ZSt10__invoke_rIvRZ4mainE3$_1JPjS2_jEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES4_E4typeEOS5_DpOS6_.exit", label %vector.body, !llvm.loop !39

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions