-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LV][TTI] Calculate cost of extracting last index in a scalable vector #144086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1443,7 +1443,7 @@ class TargetTransformInfo { | |
/// Index = -1 to indicate that there is no information about the index value. | ||
LLVM_ABI InstructionCost | ||
getExtractWithExtendCost(unsigned Opcode, Type *Dst, VectorType *VecTy, | ||
unsigned Index, TTI::TargetCostKind CostKind) const; | ||
int Index, TTI::TargetCostKind CostKind) const; | ||
|
||
/// \return The expected cost of control-flow related instructions such as | ||
/// Phi, Ret, Br, Switch. | ||
|
@@ -1465,14 +1465,21 @@ class TargetTransformInfo { | |
OperandValueInfo Op2Info = {OK_AnyValue, OP_None}, | ||
const Instruction *I = nullptr) const; | ||
|
||
enum : int { | ||
UnknownIndex = -1, | ||
LastIndex = -2, | ||
}; | ||
|
||
static inline bool isKnownVectorIndex(int Index) { return Index >= 0; } | ||
|
||
/// \return The expected cost of vector Insert and Extract. | ||
/// Use -1 to indicate that there is no information on the index value. | ||
/// This is used when the instruction is not available; a typical use | ||
/// case is to provision the cost of vectorization/scalarization in | ||
/// vectorizer passes. | ||
LLVM_ABI InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val, | ||
TTI::TargetCostKind CostKind, | ||
unsigned Index = -1, | ||
int Index = UnknownIndex, | ||
const Value *Op0 = nullptr, | ||
const Value *Op1 = nullptr) const; | ||
|
||
|
@@ -1486,7 +1493,7 @@ class TargetTransformInfo { | |
/// of the extract(nullptr if user is not known before vectorization) and | ||
/// 'Idx' being the extract lane. | ||
LLVM_ABI InstructionCost getVectorInstrCost( | ||
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index, | ||
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, int Index, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to update the TTI hooks to check if LastIndex has been passed, to avoid unexpected results now that we are calling it with a potentially unexpected value -2? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have done that implicitly by changing all the TTI hooks to call There may be places where target hooks were never checking for -1 and always assuming a known index - I figured in those cases my patch doesn't make things any more broken than they were already. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah yes I see. For those places, could we assert that we either have |
||
Value *Scalar, | ||
ArrayRef<std::tuple<Value *, User *, int>> ScalarUserAndIdx) const; | ||
|
||
|
@@ -1498,7 +1505,7 @@ class TargetTransformInfo { | |
/// exists (e.g., from basic blocks during transformation). | ||
LLVM_ABI InstructionCost getVectorInstrCost(const Instruction &I, Type *Val, | ||
TTI::TargetCostKind CostKind, | ||
unsigned Index = -1) const; | ||
int Index = UnknownIndex) const; | ||
|
||
/// \return The expected cost of aggregate inserts and extracts. This is | ||
/// used when the instruction is not available; a typical use case is to | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3642,7 +3642,7 @@ InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, | |
|
||
InstructionCost | ||
AArch64TTIImpl::getExtractWithExtendCost(unsigned Opcode, Type *Dst, | ||
VectorType *VecTy, unsigned Index, | ||
VectorType *VecTy, int Index, | ||
TTI::TargetCostKind CostKind) const { | ||
|
||
// Make sure we were given a valid extend opcode. | ||
|
@@ -3711,12 +3711,24 @@ InstructionCost AArch64TTIImpl::getCFInstrCost(unsigned Opcode, | |
} | ||
|
||
InstructionCost AArch64TTIImpl::getVectorInstrCostHelper( | ||
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index, | ||
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, int Index, | ||
bool HasRealUse, const Instruction *I, Value *Scalar, | ||
ArrayRef<std::tuple<Value *, User *, int>> ScalarUserAndIdx) const { | ||
assert(Val->isVectorTy() && "This must be a vector type"); | ||
|
||
if (Index != -1U) { | ||
if (Index == TargetTransformInfo::LastIndex) { | ||
if (isa<ScalableVectorType>(Val)) { | ||
// This typically requires both while and lastb instructions in order | ||
// to extract the last element. If this is in a loop the while | ||
// instruction can at least be hoisted out, although it will consume a | ||
// predicate register. The cost should be more expensive than the base | ||
// extract cost, which is 2 for most CPUs. | ||
return CostKind == TTI::TCK_CodeSize ? 2 : 3; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use the base vector instruction cost directly? |
||
} | ||
Index = cast<FixedVectorType>(Val)->getNumElements() - 1; | ||
} | ||
|
||
if (TargetTransformInfo::isKnownVectorIndex(Index)) { | ||
// Legalize the type. | ||
std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Val); | ||
|
||
|
@@ -3884,16 +3896,15 @@ InstructionCost AArch64TTIImpl::getVectorInstrCostHelper( | |
|
||
InstructionCost AArch64TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, | ||
TTI::TargetCostKind CostKind, | ||
unsigned Index, | ||
const Value *Op0, | ||
int Index, const Value *Op0, | ||
const Value *Op1) const { | ||
bool HasRealUse = | ||
Opcode == Instruction::InsertElement && Op0 && !isa<UndefValue>(Op0); | ||
return getVectorInstrCostHelper(Opcode, Val, CostKind, Index, HasRealUse); | ||
} | ||
|
||
InstructionCost AArch64TTIImpl::getVectorInstrCost( | ||
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index, | ||
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, int Index, | ||
Value *Scalar, | ||
ArrayRef<std::tuple<Value *, User *, int>> ScalarUserAndIdx) const { | ||
return getVectorInstrCostHelper(Opcode, Val, CostKind, Index, false, nullptr, | ||
|
@@ -3903,7 +3914,7 @@ InstructionCost AArch64TTIImpl::getVectorInstrCost( | |
InstructionCost AArch64TTIImpl::getVectorInstrCost(const Instruction &I, | ||
Type *Val, | ||
TTI::TargetCostKind CostKind, | ||
unsigned Index) const { | ||
int Index) const { | ||
return getVectorInstrCostHelper(I.getOpcode(), Val, CostKind, Index, | ||
true /* HasRealUse */, &I); | ||
} | ||
|
@@ -4052,10 +4063,13 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost( | |
// loading the vector from constant pool or in some cases, may also result | ||
// in scalarization. For now, we are approximating this with the | ||
// scalarization cost. | ||
auto ExtractCost = 2 * getVectorInstrCost(Instruction::ExtractElement, Ty, | ||
CostKind, -1, nullptr, nullptr); | ||
auto InsertCost = getVectorInstrCost(Instruction::InsertElement, Ty, | ||
CostKind, -1, nullptr, nullptr); | ||
auto ExtractCost = | ||
2 * getVectorInstrCost(Instruction::ExtractElement, Ty, CostKind, | ||
TargetTransformInfo::UnknownIndex, nullptr, | ||
nullptr); | ||
auto InsertCost = getVectorInstrCost( | ||
Instruction::InsertElement, Ty, CostKind, | ||
TargetTransformInfo::UnknownIndex, nullptr, nullptr); | ||
unsigned NElts = cast<FixedVectorType>(Ty)->getNumElements(); | ||
return ExtractCost + InsertCost + | ||
NElts * getArithmeticInstrCost(Opcode, Ty->getScalarType(), | ||
|
@@ -4153,9 +4167,11 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost( | |
// On AArch64, without SVE, vector divisions are expanded | ||
// into scalar divisions of each pair of elements. | ||
Cost += getVectorInstrCost(Instruction::ExtractElement, Ty, CostKind, | ||
-1, nullptr, nullptr); | ||
Cost += getVectorInstrCost(Instruction::InsertElement, Ty, CostKind, -1, | ||
nullptr, nullptr); | ||
TargetTransformInfo::UnknownIndex, nullptr, | ||
nullptr); | ||
Cost += getVectorInstrCost(Instruction::InsertElement, Ty, CostKind, | ||
TargetTransformInfo::UnknownIndex, nullptr, | ||
nullptr); | ||
} | ||
|
||
// TODO: if one of the arguments is scalar, then it's not necessary to | ||
|
@@ -4186,11 +4202,13 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost( | |
return LT.first; | ||
return cast<VectorType>(Ty)->getElementCount().getKnownMinValue() * | ||
(getArithmeticInstrCost(Opcode, Ty->getScalarType(), CostKind) + | ||
getVectorInstrCost(Instruction::ExtractElement, Ty, CostKind, -1, | ||
nullptr, nullptr) * | ||
getVectorInstrCost(Instruction::ExtractElement, Ty, CostKind, | ||
TargetTransformInfo::UnknownIndex, nullptr, | ||
nullptr) * | ||
2 + | ||
getVectorInstrCost(Instruction::InsertElement, Ty, CostKind, -1, | ||
nullptr, nullptr)); | ||
getVectorInstrCost(Instruction::InsertElement, Ty, CostKind, | ||
TargetTransformInfo::UnknownIndex, nullptr, | ||
nullptr)); | ||
case ISD::ADD: | ||
case ISD::XOR: | ||
case ISD::OR: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a precondition that LastIndex always has to be with a scalable VecTy, similar to VPLane? E.g. it would become
ScalableLastIndex
andTargetTransformInfo::getVectorInstrCost
and friends would assert the vector is indeed scalable if its used.That way we wouldn't need to do the
if (isa<ScalableVectorType>(Val))
check +Index = cast<FixedVectorType>(Val)->getNumElements() - 1;
in each TTI hookThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I did wonder that myself. I just wasn't sure if
LastIndex
might also be convenient for callers to avoid calculating the last index for fixed-width vectors each time. I have no problem adding an assert if others agree?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like one of the callers does the calculation whilst the other doesn't:
TTI.getVectorInstrCost( Instruction::ExtractElement, VectorTy, CostKind, VF.isScalable() ? TargetTransformInfo::LastIndex : VF.getKnownMinValue() - 1));
I don't have a strong opinion on this btw, just seems like it might be easier to handle this in the callers since there's fewer of them than target hooks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we have
isKnownVectorIndex
below that returns true for positive indices, it seems it would be more consistent to not use LastIndex for fixed-width vectors; in those cases the last index would be known.AFAICT we could keep the index unsigned, and still have -1/-2 as special values? If we keep it as
int
would be good to be clear what other negative values mean.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I thought about leaving it as unsigned, but that forces me to write slightly uglier code like this:
If we ever want to support other complex indices (such as extracting the penultimate index - something I noticed the vectoriser now does) this is only going to get worse. I thought it was more scalable (excuse the pun!) to use a signed integer since I can then assume that all non-negative values are known indices. (BTW I personally already think that comparing everywhere against (unsigned)-1 isn't very nice! Although forcing targets to use a static const in TargetTransformInfo would hide some of that.)