Open
Description
After #127180, the vectorizer emits vp.merge + general sdiv/udiv instead of vp.sdiv/udiv for tail folding by EVL.
However, using vp.udiv/sdiv may yield better performance. The improvement could come from fewer vsetvli instructions and lower vector register pressure.
The current IR and assembly for sdiv: https://godbolt.org/z/YvPhGa8df
The vp intrinsic IR and assembly for sdiv: https://godbolt.org/z/1achsE3Wo
Not yet sure at which stage this optimization should be applied. We need more discussion.
Label it as RISCV backend issue for now.