Skip to content

Conversation

@AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Oct 8, 2025

Replaced by #121728

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 8, 2025
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@AndyAyersMS
Copy link
Member Author

TP cost if we run this (when optimizing) only if there are improper headers is small, except for arm32. I wonder if the "failed to canonicalize" loops are getting mixed in here more often?

Of course if we were to do this for real we might need to run it in minopts, too, and presumably there the cost would be somewhat higher as we don't build loops their yet either.

@AndyAyersMS
Copy link
Member Author

diffs with the transformation enabled.

About 0.25% TP, and some sizeable code size increases, but those are somewhat unavoidable and probably less than we'd see with schemes that are perf-oriented and duplicate existing code. Interestingly enough some code size improvements (one guess is that this transformation might at times make LSRA's life easier; perhaps fewer critical edges / friendlier block ordering).

We can probably claw back a bit of the size by reusing the "transition" blocks (preds of the dispatch block); we'd need at most two per SCC entry (one for "outside" and one for "inside").

For something like wasm, we'd likely run this transformation later than I have it here currently, so TP would probably be somewhat cheaper (the added blocks/IR wouldn't gum up the optimization phases).

We can't fully validate all SCCs are removed just yet (though we seem to get almost all) because the current loop finding code allows loops to include exceptional flow; for our use cases we don't need to consider this, so we need a tweak to loop finding.

AndyAyersMS added a commit that referenced this pull request Nov 22, 2025
If the Wasm DFS detects improper loop headers, then we have irreducible
loops that cannot be expressed in Wasm control flow.

To fix this, run a pass to find the SCCs in the flow graph using
Kosaraju's algorithm. Then invoke this algorithm recursively on the
subgraph formed from the nodes in each SCC, minus the SCC entry nodes
(nodes in the SCC with preds not in the SCC). Repeat until all "nested"
SCCs are identified. This represents the full set of irreducible loops
we need to transform. Note no SCCs share headers but nested SCCs will
share interior blocks.

Single-entry SCCs are reducible loops and don't require any special
processing as they can be emitted as Wasm lops. But multi-entry SCCs are
irreducible loops and must be transformed.

So we transform each multi-emtry SCC (working inner to outer) by
creating a per-SCC control var and dispatch block. Each SCC header is
assigned an index from 0...N-1, where N is the number of headers in that
SCC. The dispatch block switches to each the headers based on their
index and the control var. Each pre-existing edge to the header is then
logically split and the index var is assigned the index for that header
and retargeted to the dispatch node. As an optimization and to handle
some unsplittable edges, if an SCC header's pred has the header as its
only successor, we put the control var assignment into the pred instead
of splitting the edge.

This transforms each multi-entry SCC into a single-entry reducible loop.
In checked builds we verify by rerunning the DFS and assert that there
are no longer any improper headers.

Note there are other strategies for resolving SCCs into reducible loops
that might offer better performance; we are intentionally picking
something simple.

Defer handling cases where the original DFS found non-funclet blocks
that could only be reached via EH, as we do not yet have a way of
describing how Wasm control can reach such blocks. We will revisit this
once we have the Wasm EH model design in place. Such cases are fairly
rare (eg a try/catch that ends with a goto or return).

We currently run the SCC transform before lower to allow lower the
chance to optimize the switch and because we introduce new IR. There is
a risk that a sufficiently clever later phase (say one that could do
block cloning or jump threading) might undo the dispatch structure and
recreate an irreducible loop, but that doesn't seem to happen. The
subsequent Wasm control flow phase will also assert that its run of Wasm
DFS does not have any improper headers.

Continuation of #120534.

Contributes to #121178.

---------

Co-authored-by: Copilot <[email protected]>
@github-actions github-actions bot locked and limited conversation to collaborators Dec 22, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant