Skip to content

[analyzer] Suppress out of bounds reports after weak loop assumptions #109804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 31 additions & 11 deletions clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,25 @@ struct EvalCallOptions {
EvalCallOptions() {}
};

/// Simple control flow statements like `if` only produce a single state split,
/// so the fact that they are included in the source code implies that both
/// branches are possible (at least under some conditions) and the analyzer can
/// freely assume either of them. (This is not entirely true, because there may
/// be unmarked logical correlations between `if` statements, but is a good
/// enough heuristic and the analyzer strongly relies on it.)
/// On the other hand, in a loop the state may be split repeatedly at each
/// evaluation of the loop condition, and this can lead to following "weak"
/// assumptions even though the code does not imply that they're valid and the
/// programmer intended to cover them.
/// This function is called to mark the `State` when the engine makes an
/// assumption which is weak. Checkers may use this heuristical mark to discard
/// result and reduce the amount of false positives.
ProgramStateRef recordWeakLoopAssumption(ProgramStateRef State);

/// Returns true if `recordWeakLoopAssumption()` was called on the execution
/// path which produced `State`.
bool seenWeakLoopAssumption(ProgramStateRef State);
Comment on lines +146 to +150
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While browsing through the codebase, I've seen that these methods for the other traits are defined inside class ExprEngine as static methods with setters and deleters being private and getters being public.

IIUC, they are in the global scope so that they can be accessed from ArrayBoundCheckerV2. Tbh, I would make recordWeakLoopAssumption() a private static method of the ExprEngine class, as only this class is able to assume anything about the condition, so it should be the only one, that can set any state traits related to those assumptions, while seenWeakLoopAssumption() could be a public static member function of ExprEngine to follow the pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I prefer plain functions instead of static methods that are only vaguely connected to a class, but I can move them to ExprEngine to follow the precedents.

Making recordWeakLoopAssumption() private is also a good point -- I'll either do so or just inline its short one-line definition at the few locations where it's called. (I think the transparent State->set<> would be clearer than just calling some random method. If we need more complex logic later, we can reintroduce a method like it.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

methods that are only vaguely connected to a class

I think these methods are strongly connected to this class, as they only make sense in the context of ExprEngine, which is also indicated by them being in ExprEngine.h. Also ExprEngine is the only class that can set this trait, as no one else has access to the required information to do so.


class ExprEngine {
void anchor();

Expand Down Expand Up @@ -323,12 +342,13 @@ class ExprEngine {

/// ProcessBranch - Called by CoreEngine. Used to generate successor
/// nodes by processing the 'effects' of a branch condition.
void processBranch(const Stmt *Condition,
NodeBuilderContext& BuilderCtx,
ExplodedNode *Pred,
ExplodedNodeSet &Dst,
const CFGBlock *DstT,
const CFGBlock *DstF);
/// If the branch condition is a loop condition, IterationsFinishedInLoop is
/// the number of already finished iterations (0, 1, 2...); otherwise it's
/// std::nullopt.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// the number of already finished iterations (0, 1, 2...); otherwise it's
/// std::nullopt.
/// the number of already finished iterations (0, 1, 2, ...); otherwise it's
/// std::nullopt.

Can't IterationsFinishedInLoop be a state trait? We already have state traits for element index inside arrays when elements are constructed and destructed, I suspect this would work similarly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't IterationsFinishedInLoop be a state trait?

We could introduce it, but we'd need to maintain a nested stack of loops (1 iteration in that while, then 2 iterations in the for, then 1 iteration in this other for in the function called there...) to correctly monitor the number of iterations.

This would make the heuristic introduced in this commit slightly more accurate, but I think that for practical purposes the current code is already good enough, so I didn't invest time into implementing this loop iteration count stack. (However it would be a nice follow-up commit if I have time.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could introduce it, but we'd need to maintain a nested stack of loops (1 iteration in that while, then 2 iterations in the for, then 1 iteration in this other for in the function called there...) to correctly monitor the number of iterations.

You can bind the current iteration number to the AST node of the loop, as it happens with nested ArrayInitLoopExpr IIRC. I'm sure we already have something similar implemented.

I'd prefer using state traits here instead of modifying the engine API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not say that processBranch() is part of the "engine API": it's a method which is called from one single location which is another closely related component of the engine, and I cannot imagine a valid use case for calling it from any other context. It's unfortunate that I'm extending this already long parameter list, but this kind of complexity cannot be entirely avoided in this sort of internal worker method.

I looked at the ArrayInitLoopExpr logic and it is not analogous to what I need. The main difference is that

  • IndexOfElementToConstruct is used to drive the iterations of the initialization loop, i.e. count the number of iterations and stop the loop when this reaches the Size of the array that's being constructed;
  • my code will a small part of the body of larger loops which are driven by unrelated, external checks.

This difference means that in an array init loop the code needs to manage an iteration count, while here I think it's much better to access the canonical BlockCount (which is, roughly speaking, the iteration count).

However, unfortunately the BlockCount is not the exactly right sort of iteration count for the heuristic that I want, because it's increasing monotonically with each evaluation of the loop condition, while I want to restart counting the number of iterations each time when we "reach the loop again" (due to a new iteration in an outer loop, or a new function call, or a goto that jumps to a place before the loop etc.).

Notice that ArrayInitLoopExpr does something similar when it adds a LocationContext to the key to distinguish initloops that happen within different function calls -- however, in my case something like this not be sufficient to restart iteration counting when we enter a new iteration of an outer loop.

This is the reason why I spoke about a "stack" in my previous comment -- to properly reset the iteration count each time we "reach the loop again", we'd need:

  • a stack of the loops that we're inside of;
  • tagging these loops with LocationContexts to distinguish executing the same code from different function calls;
  • preferably some additional logic to handle goto, break, return and similar stuff.

In theory it's perfectly possible to implement this exact iteration counting, but I feel that in practice it's not worth the effort, because the BlockCount is a good enough approximation for this heuristic and it will make this (already complex) code significantly easier to maintain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't binding the iteration count to the loop terminator sufficient?

Let's say we pass the terminator to processBranch() instead of the iteration count.
If there is no trait set for the terminator, we set it on the true branch, when the loop is entered.
If the terminator is already set, we know that we finished one iteration already and increment it.
If the condition is false and the loop is not entered again the trait is removed.

I'm not sure how well this would work though, since I didn't test it.

void processBranch(const Stmt *Condition, NodeBuilderContext &BuilderCtx,
ExplodedNode *Pred, ExplodedNodeSet &Dst,
const CFGBlock *DstT, const CFGBlock *DstF,
std::optional<unsigned> IterationsFinishedInLoop);

/// Called by CoreEngine.
/// Used to generate successor nodes for temporary destructors depending
Expand Down Expand Up @@ -583,11 +603,11 @@ class ExprEngine {
ExplodedNode *Pred,
ExplodedNodeSet &Dst);

/// evalEagerlyAssumeBinOpBifurcation - Given the nodes in 'Src', eagerly assume symbolic
/// expressions of the form 'x != 0' and generate new nodes (stored in Dst)
/// with those assumptions.
void evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst, ExplodedNodeSet &Src,
const Expr *Ex);
/// evalEagerlyAssumeOpBifurcation - Given the nodes in 'Src', eagerly assume
/// symbolic expressions of the form 'x != 0' or '!x' and generate new nodes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't rename this method. The functionality of the method doesn't change, it still does what it used to do, so the renaming just creates noise in the diff. If you think the name is wrong, you can open a separate NFC to address that.

Since more than one method would need to be renamed after this one, I suggest you leave this alone.

Copy link
Contributor Author

@NagyDonat NagyDonat Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this method is incorrect and misleading, which wasted my time while I was debugging some aspect of my patch, so I want to rename it. I'm not interested in renaming other methods, and I think it's a big waste of time to create a separate commit for four lines of completely NFC changes. (Also, creating a separate patch for every small change is polluting the diff log with small inconsequential commits.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that some other methods that this method calls also have the same misleading and incorrect name like geteagerlyAssumeBinOpBifurcationTags(). I think if we rename this method, every other similar method should also be renamed.

This is why I suggested a separate patch, or if you want to do it in this patch just keep them in 2 different commits so they are separated when the PR is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you've already seen, I created a separate NFC PR for this renaming and a few other things: #112209

Once that's merged, I'll rebase this patch onto it.

/// (stored in Dst) with those assumptions.
void evalEagerlyAssumeOpBifurcation(ExplodedNodeSet &Dst,
ExplodedNodeSet &Src, const Expr *Ex);

static std::pair<const ProgramPointTag *, const ProgramPointTag *>
geteagerlyAssumeBinOpBifurcationTags();
Expand Down
5 changes: 5 additions & 0 deletions clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -697,6 +697,11 @@ void ArrayBoundCheckerV2::reportOOB(CheckerContext &C,
ProgramStateRef ErrorState, Messages Msgs,
NonLoc Offset, std::optional<NonLoc> Extent,
bool IsTaintBug /*=false*/) const {
// Suppress results found through execution paths where in some loop the
// analyzer arbitrarily assumed either that the loop is skipped (0 iterations)
// or that 3 or more iterations are executed.
if (seenWeakLoopAssumption(ErrorState))
return;

ExplodedNode *ErrorNode = C.generateErrorNode(ErrorState);
if (!ErrorNode)
Expand Down
25 changes: 24 additions & 1 deletion clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -441,10 +441,33 @@ void CoreEngine::HandleCallEnter(const CallEnter &CE, ExplodedNode *Pred) {
void CoreEngine::HandleBranch(const Stmt *Cond, const Stmt *Term,
const CFGBlock * B, ExplodedNode *Pred) {
assert(B->succ_size() == 2);

const LocationContext *LC = Pred->getLocationContext();
BlockCounter Counter = WList->getBlockCounter();
unsigned BlockCount =
Counter.getNumVisited(LC->getStackFrame(), B->getBlockID());
std::optional<unsigned> IterationsFinishedInLoop = std::nullopt;
if (isa<ForStmt, WhileStmt, CXXForRangeStmt>(Term)) {
// FIXME: This code approximates the number of finished iteration based on
// the block count, i.e. the number of evaluations of the terminator block
// on the current execution path (which includes the current evaluation, so
// is always at least 1). This is probably acceptable for the
// checker-specific false positive suppression that currently uses this
// value, but it would be better to calcuate an accurate count of
// iterations.
assert(BlockCount >= 1);
IterationsFinishedInLoop = BlockCount - 1;
} else if (isa<DoStmt>(Term)) {
// FIXME: The fixme note in the previous branch also applies here.
// In a do-while loop one iteration happens before the first evaluation of
// the loop condition, so we don't subtract one from the block count.
IterationsFinishedInLoop = BlockCount;
}

NodeBuilderContext Ctx(*this, B, Pred);
ExplodedNodeSet Dst;
ExprEng.processBranch(Cond, Ctx, Pred, Dst, *(B->succ_begin()),
*(B->succ_begin() + 1));
*(B->succ_begin() + 1), IterationsFinishedInLoop);
// Enqueue the new frontier onto the worklist.
enqueue(Dst);
}
Expand Down
88 changes: 73 additions & 15 deletions clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,25 @@ typedef llvm::ImmutableMap<const LocationContext *, unsigned>
REGISTER_TRAIT_WITH_PROGRAMSTATE(PendingArrayDestruction,
PendingArrayDestructionMap)

// This trait is used to heuristically filter out results produced from
// execution paths that took "weak" assumptions within a loop.
REGISTER_TRAIT_WITH_PROGRAMSTATE(SeenWeakLoopAssumption, bool)

ProgramStateRef clang::ento::recordWeakLoopAssumption(ProgramStateRef State) {
return State->set<SeenWeakLoopAssumption>(true);
}

bool clang::ento::seenWeakLoopAssumption(ProgramStateRef State) {
return State->get<SeenWeakLoopAssumption>();
}
Comment on lines +219 to +225
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every registered trait should be removed when they are no longer needed to stop polluting the states. Also, aren't the traits part of the exploded graph? If they are and they are not cleaned up, they pollute the egraph too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit doesn't have any "no longer needed" point -- if we made some shaky assumption for a loop condition, I want to suppress all the ArrayBoundV2 reports that are reached through a path which takes that particular branch.

(By the way, each ExplodedNode has an associated State, which is essentially a big heap of traits that are introduced by various parts of the engine and various checkers; so yes, traits are contained in the egraph as well.)

Copy link
Member

@isuckatcs isuckatcs Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit doesn't have any "no longer needed" point

Does it still worth to keep them once we finished processing the loop? The loop condition only gives us information while we are inside the loop the condition belongs to, right? Once the loop is exited, the trait could be safely removed I think.

I only see the pattern that every state trait that we have now comes with a method that sets it and another one that removes it, so I assume we remove them for a reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing to be removed here as this trait is a bool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it still worth to keep them once we finished processing the loop? The loop condition only gives us information while we are inside the loop the condition belongs to, right? Once the loop is exited, the trait could be safely removed I think.

No, there is a common false positive pattern where an unfounded assumption within the loop is used after the loop and produces an unwanted result at that point. See e.g. the testcase loop_suppress_after_zero_iterations where the assumption in the loop introduces len == 0 as a separate branch, while there is no justified reason to handle this separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but with this left turned on, the coverage drop is huge even in cases that are not affected by the assumption.

void foo(int x, int y) {
  for (unsigned i = 0; i < x; i++) ; // split the state and set SeenWeakLoopAssumption to 'true'
  if (x != 0) return;                // drop the 'true' branch

  // no warnings are reported from this point on

  int buf[1] = {0};
  for (int i = 0; i < y; i++)
    buf[i] = 1;                      // SeenWeakLoopAssumption is 'true', so the warning is suppressed
}

This goes on through multiple function calls too.

void a() {}

void b() { a(); }

void c() { b(); }

void d() { c(); }

void main() {
  for (unsigned i = 0; i < x; i++)  ;
  if (x != 0) return;

  // no warnings are reported from this point on

  d();
}

If a warning is found inside any of a(), b(), c() or d(), it is suppressed because the trait is set on the top of the execution path.

Since we generate a sink it is just 1 false negative though, while relying on an unfounded assumption might trigger a false positive in one of the nested functions, so I guess we can live with this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example

void foo(int x, int y) {
  for (unsigned i = 0; i < x; i++) ; // split the state and set SeenWeakLoopAssumption to 'true'
  if (x != 0) return;                // drop the 'true' branch

  // no warnings are reported from this point on
}

is a very good point and I'll probably add it to the tests to highlight this limitation of the heuristic.

However, I've seen {{ArrayBoundV2}} reports where lots of stuff happens between the point where we assume that a loop can have 0 iterations (i.e. some length/size variable is equal to 0) and the point where this triggers an unwanted report; so I don't think that we can have a natural cutoff where the "SeenWeakLoopAssumption" bit may be safely cleared.

I don't see a way to avoid these kinds of false negatives without a completely different loop handling approach, so I think we should accept them in the foreseeable future. (There are already lots of precedents for losing coverage after loops.)

Comment on lines +219 to +225
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move these definitions to the place, where the other similar definitions for other state traits are found, so that they stay grouped together?


// This trait points to the last expression (logical operator) where an eager
// assumption introduced a state split (i.e. both cases were feasible). This is
// used by the WeakLoopAssumption heuristic to find situations where the an
// eager assumption introduces a state split within the evaluation of a loop
// condition.
REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeAssumptionAt, const Expr *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeAssumptionAt, const Expr *)
REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerAssumptionAt, const Expr *)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to emphasize that I'm only marking the EagerlyAssume checks that did assume something new (i.e. they introduce a state split). I'll probably rename this to LastEagerAssumptionStateSplitAt or something similar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested a renaming only because I'm not sure that EagerlyAssumeAssumption is a correct english phrase.


//===----------------------------------------------------------------------===//
// Engine construction and deletion.
//===----------------------------------------------------------------------===//
Expand Down Expand Up @@ -2128,7 +2147,7 @@ void ExprEngine::Visit(const Stmt *S, ExplodedNode *Pred,
(B->isRelationalOp() || B->isEqualityOp())) {
ExplodedNodeSet Tmp;
VisitBinaryOperator(cast<BinaryOperator>(S), Pred, Tmp);
evalEagerlyAssumeBinOpBifurcation(Dst, Tmp, cast<Expr>(S));
evalEagerlyAssumeOpBifurcation(Dst, Tmp, cast<Expr>(S));
}
else
VisitBinaryOperator(cast<BinaryOperator>(S), Pred, Dst);
Expand Down Expand Up @@ -2401,7 +2420,7 @@ void ExprEngine::Visit(const Stmt *S, ExplodedNode *Pred,
if (AMgr.options.ShouldEagerlyAssume && (U->getOpcode() == UO_LNot)) {
ExplodedNodeSet Tmp;
VisitUnaryOperator(U, Pred, Tmp);
evalEagerlyAssumeBinOpBifurcation(Dst, Tmp, U);
evalEagerlyAssumeOpBifurcation(Dst, Tmp, U);
}
else
VisitUnaryOperator(U, Pred, Dst);
Expand Down Expand Up @@ -2761,12 +2780,10 @@ assumeCondition(const Stmt *Condition, ExplodedNode *N) {
return State->assume(V);
}

void ExprEngine::processBranch(const Stmt *Condition,
NodeBuilderContext& BldCtx,
ExplodedNode *Pred,
ExplodedNodeSet &Dst,
const CFGBlock *DstT,
const CFGBlock *DstF) {
void ExprEngine::processBranch(
const Stmt *Condition, NodeBuilderContext &BldCtx, ExplodedNode *Pred,
ExplodedNodeSet &Dst, const CFGBlock *DstT, const CFGBlock *DstF,
std::optional<unsigned> IterationsFinishedInLoop) {
assert((!Condition || !isa<CXXBindTemporaryExpr>(Condition)) &&
"CXXBindTemporaryExprs are handled by processBindTemporary.");
const LocationContext *LCtx = Pred->getLocationContext();
Expand Down Expand Up @@ -2808,27 +2825,63 @@ void ExprEngine::processBranch(const Stmt *Condition,
std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
else {
assert(!isa<ObjCForCollectionStmt>(Condition));
// TODO: instead of this shortcut perhaps it would be better to "rejoin"
// the common execution path with
// StTrue = StFalse = PrevState;
builder.generateNode(PrevState, true, PredN);
builder.generateNode(PrevState, false, PredN);
continue;
}
if (StTrue && StFalse)
assert(!isa<ObjCForCollectionStmt>(Condition));

const Expr *EagerlyAssumeExpr =
PrevState->get<LastEagerlyAssumeAssumptionAt>();
const Expr *ConditionExpr = dyn_cast<Expr>(Condition);
if (ConditionExpr)
ConditionExpr = ConditionExpr->IgnoreParenCasts();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why skip casts? Also if the condition is an expression, removing the parenthesis has already happened by this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skipping casts to ensure consistent behavior between the eagerly-assume=true (default) and the eagerly-assume=false case.

Perhaps this question warrants more investigation, I'll think about it and maybe add a few new testcases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I summarized my thoughts in the inline comment #109804 (comment) (which is attached to the test which corresponds to this part of the code).

bool DidEagerlyAssume = EagerlyAssumeExpr == ConditionExpr;
bool BothFeasible = (DidEagerlyAssume || (StTrue && StFalse)) &&
builder.isFeasible(true) && builder.isFeasible(false);

// Process the true branch.
if (builder.isFeasible(true)) {
if (StTrue)
if (StTrue) {
if (BothFeasible && IterationsFinishedInLoop &&
*IterationsFinishedInLoop >= 2) {
// When programmers write a loop, they imply that at least two
// iterations are possible (otherwise they would just write an `if`),
// but the third iteration is not implied: there are situations where
// the programmer knows that there won't be a third iteration (e.g.
// they iterate over a structure that has <= 2 elements) but this is
// not marked in the source code.
// Checkers may use this heuristic mark to discard results found on
// branches that contain this "weak" assumption.
StTrue = recordWeakLoopAssumption(StTrue);
}
builder.generateNode(StTrue, true, PredN);
else
} else {
builder.markInfeasible(true);
}
}

// Process the false branch.
if (builder.isFeasible(false)) {
if (StFalse)
if (StFalse) {
if (BothFeasible && IterationsFinishedInLoop &&
*IterationsFinishedInLoop == 0) {
// There are many situations where the programmers know that there
// will be at least one iteration in a loop (e.g. a structure is not
// empty) but the analyzer cannot deduce this and reports false
// positives after skipping the loop.
// Checkers may use this heuristic mark to discard results found on
// branches that contain this "weak" assumption.
StFalse = recordWeakLoopAssumption(StFalse);
}
builder.generateNode(StFalse, false, PredN);
else
} else {
builder.markInfeasible(false);
}
}
}
currBldrCtx = nullptr;
Expand Down Expand Up @@ -3752,9 +3805,9 @@ ExprEngine::geteagerlyAssumeBinOpBifurcationTags() {
&eagerlyAssumeBinOpBifurcationFalse);
}

void ExprEngine::evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst,
ExplodedNodeSet &Src,
const Expr *Ex) {
void ExprEngine::evalEagerlyAssumeOpBifurcation(ExplodedNodeSet &Dst,
ExplodedNodeSet &Src,
const Expr *Ex) {
StmtNodeBuilder Bldr(Src, Dst, *currBldrCtx);

for (const auto Pred : Src) {
Expand All @@ -3776,6 +3829,11 @@ void ExprEngine::evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst,
ProgramStateRef StateTrue, StateFalse;
std::tie(StateTrue, StateFalse) = state->assume(*SEV);

if (StateTrue && StateFalse) {
StateTrue = StateTrue->set<LastEagerlyAssumeAssumptionAt>(Ex);
StateFalse = StateFalse->set<LastEagerlyAssumeAssumptionAt>(Ex);
}

// First assume that the condition is true.
if (StateTrue) {
SVal Val = svalBuilder.makeIntVal(1U, Ex->getType());
Expand Down
6 changes: 3 additions & 3 deletions clang/test/Analysis/loop-unrolling.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,7 @@ int simple_unknown_bound_loop() {
#ifdef DFS
clang_analyzer_numTimesReached(); // expected-warning {{16}}
#else
clang_analyzer_numTimesReached(); // expected-warning {{8}}
clang_analyzer_numTimesReached(); // expected-warning {{10}}
#endif
}
return 0;
Expand All @@ -369,9 +369,9 @@ int nested_inlined_no_unroll1() {
int k;
for (int i = 0; i < 9; i++) {
#ifdef DFS
clang_analyzer_numTimesReached(); // expected-warning {{18}}
clang_analyzer_numTimesReached(); // expected-warning {{20}}
#else
clang_analyzer_numTimesReached(); // expected-warning {{14}}
clang_analyzer_numTimesReached(); // expected-warning {{18}}
#endif
k = simple_unknown_bound_loop(); // reevaluation without inlining, splits the state as well
}
Expand Down
101 changes: 101 additions & 0 deletions clang/test/Analysis/out-of-bounds.c
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
// RUN: %clang_analyze_cc1 -Wno-array-bounds -analyzer-checker=core,alpha.security.ArrayBoundV2,debug.ExprInspection -verify %s
// RUN: %clang_analyze_cc1 -Wno-array-bounds -analyzer-checker=core,alpha.security.ArrayBoundV2,debug.ExprInspection -analyzer-config eagerly-assume=false -verify %s

// Note that eagerly-assume=false is tested separately because the
// WeakLoopAssumption suppression heuristic uses different code paths to
// achieve the same result with and without eagerly-assume.

void clang_analyzer_eval(int);

Expand Down Expand Up @@ -194,3 +199,99 @@ char test_comparison_with_extent_symbol(struct incomplete *p) {
return ((char *)p)[-1]; // no-warning
}

// WeakLoopAssumption suppression
///////////////////////////////////////////////////////////////////////

int GlobalArray[100];
int loop_suppress_after_zero_iterations(unsigned len) {
for (unsigned i = 0; i < len; i++)
if (GlobalArray[i] > 0)
return GlobalArray[i];
// Previously this would have produced an overflow warning because splitting
// the state on the loop condition introduced an execution path where the
// analyzer thinks that len == 0.
// There are very many situations where the programmer knows that an argument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// There are very many situations where the programmer knows that an argument
// There are many situations where the programmer knows that an argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep the "very" here -- that's the impression that I get when I browse through lots of ArrayBoundV2 reports (without this patch).

// is positive, but this is not indicated in the source code, so we must
// avoid reporting errors (especially out of bounds errors) on these
// branches, because otherwise we'd get prohibitively many false positives.
return GlobalArray[len - 1]; // no-warning
}

void loop_report_in_second_iteration(int len) {
int buf[1] = {0};
for (int i = 0; i < len; i++) {
// When a programmer writes a loop, we may assume that they intended at
// least two iterations.
buf[i] = 1; // expected-warning{{Out of bound access to memory}}
}
}

void loop_suppress_in_third_iteration(int len) {
int buf[2] = {0};
for (int i = 0; i < len; i++) {
// We should suppress array bounds errors on the third and later iterations
// of loops, because sometimes programmers write a loop in sitiuations
// where they know that there will be at most two iterations.
buf[i] = 1; // no-warning
}
}

void loop_suppress_in_third_iteration_cast(int len) {
int buf[2] = {0};
for (int i = 0; (unsigned)(i < len); i++) {
// Check that a (somewhat arbitrary) cast does not hinder the recognition
// of the condition expression.
buf[i] = 1; // no-warning
}
}

void loop_suppress_in_third_iteration_logical_and(int len, int flag) {
int buf[2] = {0};
for (int i = 0; i < len && flag; i++) {
// FIXME: In this case the checker should suppress the warning the same way
// as it's suppressed in loop_suppress_in_third_iteration, but the
// suppression is not activated because the terminator statement associated
// with the loop is just the expression 'flag', while 'i < len' is a
// separate terminator statement that's associated with the
// short-circuiting operator '&&'.
// I have seen a real-world FP that looks like this, but it is much rarer
// than the basic setup.
buf[i] = 1; // expected-warning{{Out of bound access to memory}}
}
}

void loop_suppress_in_third_iteration_logical_and_2(int len, int flag) {
int buf[2] = {0};
for (int i = 0; flag && i < len; i++) {
// If the two operands of '&&' are flipped, the suppression works.
buf[i] = 1; // no-warning
}
}

int coinflip(void);
int do_while_report_after_one_iteration(void) {
int i = 0;
do {
i++;
} while (coinflip());
// Unlike `loop_suppress_after_zero_iterations`, running just one iteration
// in a do-while is not a corner case that would produce too many false
// positives, so don't suppress bounds errors in these situations.
return GlobalArray[i-2]; // expected-warning{{Out of bound access to memory}}
}

void do_while_report_in_second_iteration(int len) {
int buf[1] = {0};
int i = 0;
do {
buf[i] = 1; // expected-warning{{Out of bound access to memory}}
} while (i++ < len);
}

void do_while_suppress_in_third_iteration(int len) {
int buf[2] = {0};
int i = 0;
do {
buf[i] = 1; // no-warning
} while (i++ < len);
}
Loading