Skip to content

[LV][TTI] Calculate cost of extracting last index in a scalable vector #144086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions llvm/include/llvm/Analysis/TargetTransformInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -1443,7 +1443,7 @@ class TargetTransformInfo {
/// Index = -1 to indicate that there is no information about the index value.
LLVM_ABI InstructionCost
getExtractWithExtendCost(unsigned Opcode, Type *Dst, VectorType *VecTy,
unsigned Index, TTI::TargetCostKind CostKind) const;
int Index, TTI::TargetCostKind CostKind) const;

/// \return The expected cost of control-flow related instructions such as
/// Phi, Ret, Br, Switch.
Expand All @@ -1465,14 +1465,21 @@ class TargetTransformInfo {
OperandValueInfo Op2Info = {OK_AnyValue, OP_None},
const Instruction *I = nullptr) const;

enum : int {
UnknownIndex = -1,
LastIndex = -2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a precondition that LastIndex always has to be with a scalable VecTy, similar to VPLane? E.g. it would become ScalableLastIndex and TargetTransformInfo::getVectorInstrCost and friends would assert the vector is indeed scalable if its used.

That way we wouldn't need to do the if (isa<ScalableVectorType>(Val)) check + Index = cast<FixedVectorType>(Val)->getNumElements() - 1; in each TTI hook

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I did wonder that myself. I just wasn't sure if LastIndex might also be convenient for callers to avoid calculating the last index for fixed-width vectors each time. I have no problem adding an assert if others agree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like one of the callers does the calculation whilst the other doesn't:

TTI.getVectorInstrCost(
                  Instruction::ExtractElement, VectorTy, CostKind,
                  VF.isScalable() ? TargetTransformInfo::LastIndex
                                  : VF.getKnownMinValue() - 1));

I don't have a strong opinion on this btw, just seems like it might be easier to handle this in the callers since there's fewer of them than target hooks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have isKnownVectorIndex below that returns true for positive indices, it seems it would be more consistent to not use LastIndex for fixed-width vectors; in those cases the last index would be known.

AFAICT we could keep the index unsigned, and still have -1/-2 as special values? If we keep it as int would be good to be clear what other negative values mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I thought about leaving it as unsigned, but that forces me to write slightly uglier code like this:

static const unsigned UnknownIndex = (unsigned)-1;
static const unsigned ScalableLastIndex = (unsigned)-2;

static const bool isKnownVectorIndex(unsigned Idx) {
  return (Idx == UnknownIndex) && (Idx == ScalableLastIndex);
}

If we ever want to support other complex indices (such as extracting the penultimate index - something I noticed the vectoriser now does) this is only going to get worse. I thought it was more scalable (excuse the pun!) to use a signed integer since I can then assume that all non-negative values are known indices. (BTW I personally already think that comparing everywhere against (unsigned)-1 isn't very nice! Although forcing targets to use a static const in TargetTransformInfo would hide some of that.)

};

static inline bool isKnownVectorIndex(int Index) { return Index >= 0; }

/// \return The expected cost of vector Insert and Extract.
/// Use -1 to indicate that there is no information on the index value.
/// This is used when the instruction is not available; a typical use
/// case is to provision the cost of vectorization/scalarization in
/// vectorizer passes.
LLVM_ABI InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
TTI::TargetCostKind CostKind,
unsigned Index = -1,
int Index = UnknownIndex,
const Value *Op0 = nullptr,
const Value *Op1 = nullptr) const;

Expand All @@ -1486,7 +1493,7 @@ class TargetTransformInfo {
/// of the extract(nullptr if user is not known before vectorization) and
/// 'Idx' being the extract lane.
LLVM_ABI InstructionCost getVectorInstrCost(
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index,
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, int Index,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update the TTI hooks to check if LastIndex has been passed, to avoid unexpected results now that we are calling it with a potentially unexpected value -2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done that implicitly by changing all the TTI hooks to call isKnownVectorIndex when testing if the index is known or not. Any code that previously compared against -1 now calls isKnownVectorIndex, which returns false whether the index is -1 or -2 so as far as I can tell the change is NFC for all other targets.

There may be places where target hooks were never checking for -1 and always assuming a known index - I figured in those cases my patch doesn't make things any more broken than they were already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I see. For those places, could we assert that we either have UnknownIndex or a known index, i.e. no LastIndex, except for RISV which also supports scalable vectors and would be queried with LastIndex?

Value *Scalar,
ArrayRef<std::tuple<Value *, User *, int>> ScalarUserAndIdx) const;

Expand All @@ -1498,7 +1505,7 @@ class TargetTransformInfo {
/// exists (e.g., from basic blocks during transformation).
LLVM_ABI InstructionCost getVectorInstrCost(const Instruction &I, Type *Val,
TTI::TargetCostKind CostKind,
unsigned Index = -1) const;
int Index = UnknownIndex) const;

/// \return The expected cost of aggregate inserts and extracts. This is
/// used when the instruction is not available; a typical use case is to
Expand Down
12 changes: 6 additions & 6 deletions llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -758,7 +758,7 @@ class TargetTransformInfoImplBase {

virtual InstructionCost
getExtractWithExtendCost(unsigned Opcode, Type *Dst, VectorType *VecTy,
unsigned Index, TTI::TargetCostKind CostKind) const {
int Index, TTI::TargetCostKind CostKind) const {
return 1;
}

Expand All @@ -781,7 +781,7 @@ class TargetTransformInfoImplBase {

virtual InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
TTI::TargetCostKind CostKind,
unsigned Index, const Value *Op0,
int Index, const Value *Op0,
const Value *Op1) const {
return 1;
}
Expand All @@ -791,15 +791,15 @@ class TargetTransformInfoImplBase {
/// of the extract(nullptr if user is not known before vectorization) and
/// 'Idx' being the extract lane.
virtual InstructionCost getVectorInstrCost(
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index,
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, int Index,
Value *Scalar,
ArrayRef<std::tuple<Value *, User *, int>> ScalarUserAndIdx) const {
return 1;
}

virtual InstructionCost getVectorInstrCost(const Instruction &I, Type *Val,
TTI::TargetCostKind CostKind,
unsigned Index) const {
int Index) const {
return 1;
}

Expand Down Expand Up @@ -1522,7 +1522,7 @@ class TargetTransformInfoImplCRTPBase : public TargetTransformInfoImplBase {
auto *IE = dyn_cast<InsertElementInst>(U);
if (!IE)
return TTI::TCC_Basic; // FIXME
unsigned Idx = -1;
int Idx = TargetTransformInfo::UnknownIndex;
if (auto *CI = dyn_cast<ConstantInt>(Operands[2]))
if (CI->getValue().getActiveBits() <= 32)
Idx = CI->getZExtValue();
Expand Down Expand Up @@ -1641,7 +1641,7 @@ class TargetTransformInfoImplCRTPBase : public TargetTransformInfoImplBase {
auto *EEI = dyn_cast<ExtractElementInst>(U);
if (!EEI)
return TTI::TCC_Basic; // FIXME
unsigned Idx = -1;
int Idx = TargetTransformInfo::UnknownIndex;
if (auto *CI = dyn_cast<ConstantInt>(Operands[1]))
if (CI->getValue().getActiveBits() <= 32)
Idx = CI->getZExtValue();
Expand Down
12 changes: 6 additions & 6 deletions llvm/include/llvm/CodeGen/BasicTTIImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -1341,7 +1341,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {

InstructionCost
getExtractWithExtendCost(unsigned Opcode, Type *Dst, VectorType *VecTy,
unsigned Index,
int Index,
TTI::TargetCostKind CostKind) const override {
return thisT()->getVectorInstrCost(Instruction::ExtractElement, VecTy,
CostKind, Index, nullptr, nullptr) +
Expand Down Expand Up @@ -1409,8 +1409,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
}

InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
TTI::TargetCostKind CostKind,
unsigned Index, const Value *Op0,
TTI::TargetCostKind CostKind, int Index,
const Value *Op0,
const Value *Op1) const override {
return getRegUsageForType(Val->getScalarType());
}
Expand All @@ -1420,8 +1420,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
/// of the extract(nullptr if user is not known before vectorization) and
/// 'Idx' being the extract lane.
InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
TTI::TargetCostKind CostKind,
unsigned Index, Value *Scalar,
TTI::TargetCostKind CostKind, int Index,
Value *Scalar,
ArrayRef<std::tuple<Value *, User *, int>>
ScalarUserAndIdx) const override {
return thisT()->getVectorInstrCost(Opcode, Val, CostKind, Index, nullptr,
Expand All @@ -1430,7 +1430,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {

InstructionCost getVectorInstrCost(const Instruction &I, Type *Val,
TTI::TargetCostKind CostKind,
unsigned Index) const override {
int Index) const override {
Value *Op0 = nullptr;
Value *Op1 = nullptr;
if (auto *IE = dyn_cast<InsertElementInst>(&I)) {
Expand Down
8 changes: 4 additions & 4 deletions llvm/lib/Analysis/TargetTransformInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1058,7 +1058,7 @@ InstructionCost TargetTransformInfo::getCastInstrCost(
}

InstructionCost TargetTransformInfo::getExtractWithExtendCost(
unsigned Opcode, Type *Dst, VectorType *VecTy, unsigned Index,
unsigned Opcode, Type *Dst, VectorType *VecTy, int Index,
TTI::TargetCostKind CostKind) const {
InstructionCost Cost =
TTIImpl->getExtractWithExtendCost(Opcode, Dst, VecTy, Index, CostKind);
Expand Down Expand Up @@ -1088,7 +1088,7 @@ InstructionCost TargetTransformInfo::getCmpSelInstrCost(
}

InstructionCost TargetTransformInfo::getVectorInstrCost(
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index,
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, int Index,
const Value *Op0, const Value *Op1) const {
assert((Opcode == Instruction::InsertElement ||
Opcode == Instruction::ExtractElement) &&
Expand All @@ -1100,7 +1100,7 @@ InstructionCost TargetTransformInfo::getVectorInstrCost(
}

InstructionCost TargetTransformInfo::getVectorInstrCost(
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index,
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, int Index,
Value *Scalar,
ArrayRef<std::tuple<Value *, User *, int>> ScalarUserAndIdx) const {
assert((Opcode == Instruction::InsertElement ||
Expand All @@ -1115,7 +1115,7 @@ InstructionCost TargetTransformInfo::getVectorInstrCost(
InstructionCost
TargetTransformInfo::getVectorInstrCost(const Instruction &I, Type *Val,
TTI::TargetCostKind CostKind,
unsigned Index) const {
int Index) const {
// FIXME: Assert that Opcode is either InsertElement or ExtractElement.
// This is mentioned in the interface description and respected by all
// callers, but never asserted upon.
Expand Down
54 changes: 36 additions & 18 deletions llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3642,7 +3642,7 @@ InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,

InstructionCost
AArch64TTIImpl::getExtractWithExtendCost(unsigned Opcode, Type *Dst,
VectorType *VecTy, unsigned Index,
VectorType *VecTy, int Index,
TTI::TargetCostKind CostKind) const {

// Make sure we were given a valid extend opcode.
Expand Down Expand Up @@ -3711,12 +3711,24 @@ InstructionCost AArch64TTIImpl::getCFInstrCost(unsigned Opcode,
}

InstructionCost AArch64TTIImpl::getVectorInstrCostHelper(
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index,
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, int Index,
bool HasRealUse, const Instruction *I, Value *Scalar,
ArrayRef<std::tuple<Value *, User *, int>> ScalarUserAndIdx) const {
assert(Val->isVectorTy() && "This must be a vector type");

if (Index != -1U) {
if (Index == TargetTransformInfo::LastIndex) {
if (isa<ScalableVectorType>(Val)) {
// This typically requires both while and lastb instructions in order
// to extract the last element. If this is in a loop the while
// instruction can at least be hoisted out, although it will consume a
// predicate register. The cost should be more expensive than the base
// extract cost, which is 2 for most CPUs.
return CostKind == TTI::TCK_CodeSize ? 2 : 3;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the base vector instruction cost directly? ST->getVectorInsertExtractBaseCost() + 1

}
Index = cast<FixedVectorType>(Val)->getNumElements() - 1;
}

if (TargetTransformInfo::isKnownVectorIndex(Index)) {
// Legalize the type.
std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Val);

Expand Down Expand Up @@ -3884,16 +3896,15 @@ InstructionCost AArch64TTIImpl::getVectorInstrCostHelper(

InstructionCost AArch64TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
TTI::TargetCostKind CostKind,
unsigned Index,
const Value *Op0,
int Index, const Value *Op0,
const Value *Op1) const {
bool HasRealUse =
Opcode == Instruction::InsertElement && Op0 && !isa<UndefValue>(Op0);
return getVectorInstrCostHelper(Opcode, Val, CostKind, Index, HasRealUse);
}

InstructionCost AArch64TTIImpl::getVectorInstrCost(
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index,
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, int Index,
Value *Scalar,
ArrayRef<std::tuple<Value *, User *, int>> ScalarUserAndIdx) const {
return getVectorInstrCostHelper(Opcode, Val, CostKind, Index, false, nullptr,
Expand All @@ -3903,7 +3914,7 @@ InstructionCost AArch64TTIImpl::getVectorInstrCost(
InstructionCost AArch64TTIImpl::getVectorInstrCost(const Instruction &I,
Type *Val,
TTI::TargetCostKind CostKind,
unsigned Index) const {
int Index) const {
return getVectorInstrCostHelper(I.getOpcode(), Val, CostKind, Index,
true /* HasRealUse */, &I);
}
Expand Down Expand Up @@ -4052,10 +4063,13 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
// loading the vector from constant pool or in some cases, may also result
// in scalarization. For now, we are approximating this with the
// scalarization cost.
auto ExtractCost = 2 * getVectorInstrCost(Instruction::ExtractElement, Ty,
CostKind, -1, nullptr, nullptr);
auto InsertCost = getVectorInstrCost(Instruction::InsertElement, Ty,
CostKind, -1, nullptr, nullptr);
auto ExtractCost =
2 * getVectorInstrCost(Instruction::ExtractElement, Ty, CostKind,
TargetTransformInfo::UnknownIndex, nullptr,
nullptr);
auto InsertCost = getVectorInstrCost(
Instruction::InsertElement, Ty, CostKind,
TargetTransformInfo::UnknownIndex, nullptr, nullptr);
unsigned NElts = cast<FixedVectorType>(Ty)->getNumElements();
return ExtractCost + InsertCost +
NElts * getArithmeticInstrCost(Opcode, Ty->getScalarType(),
Expand Down Expand Up @@ -4153,9 +4167,11 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
// On AArch64, without SVE, vector divisions are expanded
// into scalar divisions of each pair of elements.
Cost += getVectorInstrCost(Instruction::ExtractElement, Ty, CostKind,
-1, nullptr, nullptr);
Cost += getVectorInstrCost(Instruction::InsertElement, Ty, CostKind, -1,
nullptr, nullptr);
TargetTransformInfo::UnknownIndex, nullptr,
nullptr);
Cost += getVectorInstrCost(Instruction::InsertElement, Ty, CostKind,
TargetTransformInfo::UnknownIndex, nullptr,
nullptr);
}

// TODO: if one of the arguments is scalar, then it's not necessary to
Expand Down Expand Up @@ -4186,11 +4202,13 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
return LT.first;
return cast<VectorType>(Ty)->getElementCount().getKnownMinValue() *
(getArithmeticInstrCost(Opcode, Ty->getScalarType(), CostKind) +
getVectorInstrCost(Instruction::ExtractElement, Ty, CostKind, -1,
nullptr, nullptr) *
getVectorInstrCost(Instruction::ExtractElement, Ty, CostKind,
TargetTransformInfo::UnknownIndex, nullptr,
nullptr) *
2 +
getVectorInstrCost(Instruction::InsertElement, Ty, CostKind, -1,
nullptr, nullptr));
getVectorInstrCost(Instruction::InsertElement, Ty, CostKind,
TargetTransformInfo::UnknownIndex, nullptr,
nullptr));
case ISD::ADD:
case ISD::XOR:
case ISD::OR:
Expand Down
14 changes: 7 additions & 7 deletions llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ class AArch64TTIImpl : public BasicTTIImplBase<AArch64TTIImpl> {
/// of the extract(nullptr if user is not known before vectorization) and
/// 'Idx' being the extract lane.
InstructionCost getVectorInstrCostHelper(
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index,
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, int Index,
bool HasRealUse, const Instruction *I = nullptr, Value *Scalar = nullptr,
ArrayRef<std::tuple<Value *, User *, int>> ScalarUserAndIdx = {}) const;

Expand Down Expand Up @@ -197,30 +197,30 @@ class AArch64TTIImpl : public BasicTTIImplBase<AArch64TTIImpl> {

InstructionCost
getExtractWithExtendCost(unsigned Opcode, Type *Dst, VectorType *VecTy,
unsigned Index,
int Index,
TTI::TargetCostKind CostKind) const override;

InstructionCost getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
const Instruction *I = nullptr) const override;

InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
TTI::TargetCostKind CostKind,
unsigned Index, const Value *Op0,
TTI::TargetCostKind CostKind, int Index,
const Value *Op0,
const Value *Op1) const override;

/// \param ScalarUserAndIdx encodes the information about extracts from a
/// vector with 'Scalar' being the value being extracted,'User' being the user
/// of the extract(nullptr if user is not known before vectorization) and
/// 'Idx' being the extract lane.
InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
TTI::TargetCostKind CostKind,
unsigned Index, Value *Scalar,
TTI::TargetCostKind CostKind, int Index,
Value *Scalar,
ArrayRef<std::tuple<Value *, User *, int>>
ScalarUserAndIdx) const override;

InstructionCost getVectorInstrCost(const Instruction &I, Type *Val,
TTI::TargetCostKind CostKind,
unsigned Index) const override;
int Index) const override;

InstructionCost
getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty, FastMathFlags FMF,
Expand Down
4 changes: 2 additions & 2 deletions llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -834,7 +834,7 @@ GCNTTIImpl::getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty,

InstructionCost GCNTTIImpl::getVectorInstrCost(unsigned Opcode, Type *ValTy,
TTI::TargetCostKind CostKind,
unsigned Index, const Value *Op0,
int Index, const Value *Op0,
const Value *Op1) const {
switch (Opcode) {
case Instruction::ExtractElement:
Expand All @@ -853,7 +853,7 @@ InstructionCost GCNTTIImpl::getVectorInstrCost(unsigned Opcode, Type *ValTy,
// operations, and we don't have to copy into a different register class.

// Dynamic indexing isn't free and is best avoided.
return Index == ~0u ? 2 : 0;
return TargetTransformInfo::isKnownVectorIndex(Index) ? 0 : 2;
}
default:
return BaseT::getVectorInstrCost(Opcode, ValTy, CostKind, Index, Op0, Op1);
Expand Down
4 changes: 2 additions & 2 deletions llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,8 @@ class GCNTTIImpl final : public BasicTTIImplBase<GCNTTIImpl> {

using BaseT::getVectorInstrCost;
InstructionCost getVectorInstrCost(unsigned Opcode, Type *ValTy,
TTI::TargetCostKind CostKind,
unsigned Index, const Value *Op0,
TTI::TargetCostKind CostKind, int Index,
const Value *Op0,
const Value *Op1) const override;

bool isReadRegisterSourceOfDivergence(const IntrinsicInst *ReadReg) const;
Expand Down
Loading
Loading