Skip to content

Conversation

@AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Nov 18, 2025

If the Wasm DFS detects improper loop headers, then we have irreducible loops that cannot be expressed in Wasm control flow.

To fix this, run a pass to find the SCCs in the flow graph using Kosaraju's algorithm. Then invoke this algorithm recursively on the subgraph formed from the nodes in each SCC, minus the SCC entry nodes (nodes in the SCC with preds not in the SCC). Repeat until all "nested" SCCs are identified. This represents the full set of irreducible loops we need to transform. Note no SCCs share headers but nested SCCs will share interior blocks.

Single-entry SCCs are reducible loops and don't require any special processing as they can be emitted as Wasm lops. But multi-entry SCCs are irreducible loops and must be transformed.

So we transform each multi-emtry SCC (working inner to outer) by creating a per-SCC control var and dispatch block. Each SCC header is assigned an index from 0...N-1, where N is the number of headers in that SCC. The dispatch block switches to each the headers based on their index and the control var. Each pre-existing edge to the header is then logically split and the index var is assigned the index for that header and retargeted to the dispatch node. As an optimization and to handle some unsplittable edges, if an SCC header's pred has the header as its only successor, we put the control var assignment into the pred instead of splitting the edge.

This transforms each multi-entry SCC into a single-entry reducible loop. In checked builds we verify by rerunning the DFS and assert that there are no longer any improper headers.

Note there are other strategies for resolving SCCs into reducible loops that might offer better performance; we are intentionally picking something simple.

Defer handling cases where the original DFS found non-funclet blocks that could only be reached via EH, as we do not yet have a way of describing how Wasm control can reach such blocks. We will revisit this once we have the Wasm EH model design in place. Such cases are fairly rare (eg a try/catch that ends with a goto or return).

We currently run the SCC transform before lower to allow lower the chance to optimize the switch and because we introduce new IR. There is a risk that a sufficiently clever later phase (say one that could do block cloning or jump threading) might undo the dispatch structure and recreate an irreducible loop, but that doesn't seem to happen. The subsequent Wasm control flow phase will also assert that its run of Wasm DFS does not have any improper headers.

Continuation of #120534.

Contributes to #121178.

AndyAyersMS and others added 24 commits November 6, 2025 11:34
Determine how to emit Wasm control flow from the JIT's control flow graph.

Relies on loop-aware RPO to determine the block order. Currently only
handles the main method. Assumes irreducible loops have been fixed
upstream (which is not yet guaranteed; bails out if not so).

Doesn't actually do any emission, just prints a textual description in
the JIT dump (along with a dot markup version).

Uses only LOOP and BLOCK. Tries to limit the extent of BLOCK.

Run for now as an optional phase even if not targeting Wasm, to
do some stress testing.

Contributes to dotnet#121178
Co-authored-by: Copilot <[email protected]>
Loops for Wasm control flow codegen don't involve EH or runtime mediated
control flow transfers.

Implement a custom block successor enumerator for Wasm, and adjust `fgRunDFS`
to allow using this and also to generalize how the DFS is initiated. Use
this to build a "Wasm" DFS. In that DFS handle both the main method and
all funclets (by specifying funclet entries as additional DFS starting points).

Update the loop finding code to make suitable changes when it is driven from
a "Wasm" DFS instead of the typical all successor / all predecessor DFS.

Remove the restriction in the Wasm control flow codegen that only handles
the main method; now it works for the main method and all funclets.

Contributes to dotnet#121178.
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 18, 2025
@am11 am11 added the arch-wasm WebAssembly architecture label Nov 18, 2025
@AndyAyersMS AndyAyersMS marked this pull request as ready for review November 18, 2025 16:32
Copilot AI review requested due to automatic review settings November 18, 2025 16:32
@kg
Copy link
Member

kg commented Nov 19, 2025

The parts I understand LGTM

@AndyAyersMS AndyAyersMS mentioned this pull request Nov 21, 2025
@AndyAyersMS
Copy link
Member Author

@dotnet/jit-contrib any other comments?

If not, I need one of you to approve this.

@AndyAyersMS AndyAyersMS merged commit b1d5443 into dotnet:main Nov 22, 2025
110 of 112 checks passed
Comment on lines +4943 to +4952
#ifdef DEBUG
// If we are going to simulate generating wasm control flow,
// transform any strongly connected components into reducible flow.
//
if (JitConfig.JitWasmControlFlow() > 0)
{
DoPhase(this, PHASE_DFS_BLOCKS_WASM, &Compiler::fgDfsBlocksAndRemove);
DoPhase(this, PHASE_WASM_TRANSFORM_SCCS, &Compiler::fgWasmTransformSccs);
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional this was placed before PHASE_ASYNC, even though it is eventually going to have to be after?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not intentional. It should be moved to just after the async transformation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #121973

Comment on lines +273 to +277
if (BitVecOps::IsMember(m_traits, m_blocks, pred->bbPostorderNum))
{
// Pred is in the scc, so not an entry edge
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that we see predecessors here that won't be in the DFS tree such that pred->bbPostorderNum is something undefined? Should this be guarded on that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run a DFS+Remove pass just before, but being defensive here can't hurt. Let me look into this.

Comment on lines +463 to +483
// Dump subgraph as dot
{
JITDUMP("digraph scc_%u_nested_subgraph%u {\n", m_num, nestedCount);
BitVecOps::Iter iterator(m_traits, nestedBlocks);
unsigned int poNum;
bool first = true;
while (iterator.NextElem(&poNum))
{
BasicBlock* const block = m_dfsTree->GetPostOrder(poNum);

JITDUMP(FMT_BB ";\n", block->bbNum);

WasmSuccessorEnumerator successors(m_comp, block, /* useProfile */ true);
for (BasicBlock* const succ : successors)
{
JITDUMP(FMT_BB " -> " FMT_BB ";\n", block->bbNum, succ->bbNum);
}
}

JITDUMP("}\n");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this under #ifdef DEBUG and if (verbose)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #121973.

Comment on lines +750 to +752
// TODO: if we had a BV iter that worked from highest set
// bit to lowest, we could iterate the subset directly
// and avoid searching here.
Copy link
Member

@jakobbotsch jakobbotsch Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use BitVecOps::VisitBitsReverse for this.

In fact I would suggest switching away from the BV iter in most places here and unify the interface of Scc with FlowGraphNaturalLoop.

Comment on lines +767 to +774
if (sccs.Height() > 0)
{
for (int i = 0; i < sccs.Height(); i++)
{
Scc* const scc = sccs.Bottom(i);
scc->Finalize();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The emptiness check looks unnecessary.

//
void FgWasm::WasmFindSccsCore(BitVec& subset, ArrayStack<Scc*>& sccs, BasicBlock** postorder, unsigned postorderCount)
{
SccMap map(Comp()->getAllocator(CMK_WasmSccTransform));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this map be sparse, or could it just be a flat map indexed by the postorder indices, since that mapping is dense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially this will need to cover the entire method, so I think we can create a flat array for that and then reuse it for subsequent subset cases.

Comment on lines +363 to +366
for (BasicBlock* const pred : block->PredBlocks())
{
advance();
hasPred = true;
if (!BitVecOps::IsMember(&m_traits, subgraph, pred->bbPostorderNum))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, does this need to be guarded on m_dfsTree->Contains(pred)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arch-wasm WebAssembly architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants