Description
When enabling EVL tail folding, the llvm.splice operation may encounter errors in the final iteration because the EVL in the second-to-last iteration might not equal VF * UF.
This could result in unexpected behavior, such as:
llvm.splice([A, B, C, poison], [D, E, poison, poison], -1) ==> [poison, D, E, poison]
This issue was identified by the LLVM test-suite in SingleSource/UnitTests/Vectorizer/recurrences.test
.
Checking first_order_recurrence
Checking second_order_recurrence
Checking third_order_recurrence
Miscompare
Currently, we have temporarily disabled this feature using #122458. It will be re-enabled after implementing the following fixes.
vector.ph: ; preds = %for.body.preheader.i.i.i
...
%max.vf.1 = tail call i32 @llvm.vscale.i32()
%max.vf = shl nuw nsw i32 %max.vf.1, 2
br label %vector.body
vector.body: ; preds = %vector.body, %vector.ph
%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
%evl.based.iv = phi i64 [ 0, %vector.ph ], [ %index.evl.next, %vector.body ]
### Record the evl of previous iteration. Initialized by VF ###
%prev.evl = phi i32 [ %max.vf, %vector.ph ], [ %17, %vector.body ]
%vector.recur = phi <vscale x 4 x i32> [ %vector.recur.init, %vector.ph ], [ %vp.op.load, %vector.body ]
%vector.recur8 = phi <vscale x 4 x i32> [ %vector.recur.init7, %vector.ph ], [ %19, %vector.body ]
%vector.recur10 = phi <vscale x 4 x i32> [ %vector.recur.init9, %vector.ph ], [ %20, %vector.body ]
%avl = sub i64 %wide.trip.count.i.i.i, %evl.based.iv
%17 = tail call i32 @llvm.experimental.get.vector.length.i64(i64 %avl, i32 4, i1 true)
%18 = getelementptr inbounds nuw i32, ptr %__args.val, i64 %evl.based.iv
%vp.op.load = tail call <vscale x 4 x i32> @llvm.vp.load.nxv4i32.p0(ptr align 4 %18, <vscale x 4 x i1> splat (i1 true), i32 %17), !tbaa !6
### Replace llvm.splice with llvm.experimental.vp.splice. ###
%19 = tail call <vscale x 4 x i32> @llvm.experimental.vp.splice.nxv4i32(<vscale x 4 x i32> %vector.recur, <vscale x 4 x i32> %vp.op.load, i32 -1, <vscale x 4 x i1> splat (i1 true), i32 %prev.evl, i32 %17)
%20 = tail call <vscale x 4 x i32> @llvm.experimental.vp.splice.nxv4i32(<vscale x 4 x i32> %vector.recur8, <vscale x 4 x i32> %19, i32 -1, <vscale x 4 x i1> splat (i1 true), i32 %prev.evl, i32 %17)
%21 = tail call <vscale x 4 x i32> @llvm.experimental.vp.splice.nxv4i32(<vscale x 4 x i32> %vector.recur10, <vscale x 4 x i32> %20, i32 -1, <vscale x 4 x i1> splat (i1 true), i32 %prev.evl, i32 %17)
%vp.op = tail call <vscale x 4 x i32> @llvm.vp.add.nxv4i32(<vscale x 4 x i32> %20, <vscale x 4 x i32> %19, <vscale x 4 x i1> splat (i1 true), i32 %17)
%vp.op11 = tail call <vscale x 4 x i32> @llvm.vp.add.nxv4i32(<vscale x 4 x i32> %vp.op, <vscale x 4 x i32> %21, <vscale x 4 x i1> splat (i1 true), i32 %17)
%22 = getelementptr inbounds nuw i32, ptr %__args1.val, i64 %evl.based.iv
tail call void @llvm.vp.store.nxv4i32.p0(<vscale x 4 x i32> %vp.op11, ptr align 4 %22, <vscale x 4 x i1> splat (i1 true), i32 %17), !tbaa !6
%23 = zext i32 %17 to i64
%index.evl.next = add nuw i64 %evl.based.iv, %23
%index.next = add nuw i64 %index, %7
%24 = icmp eq i64 %index.next, %n.vec
br i1 %24, label %"_ZSt10__invoke_rIvRZ4mainE3$_1JPjS2_jEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES4_E4typeEOS5_DpOS6_.exit", label %vector.body, !llvm.loop !39