Wasm irreducible loop transformation #121728

AndyAyersMS · 2025-11-18T02:40:12Z

If the Wasm DFS detects improper loop headers, then we have irreducible loops that cannot be expressed in Wasm control flow.

To fix this, run a pass to find the SCCs in the flow graph using Kosaraju's algorithm. Then invoke this algorithm recursively on the subgraph formed from the nodes in each SCC, minus the SCC entry nodes (nodes in the SCC with preds not in the SCC). Repeat until all "nested" SCCs are identified. This represents the full set of irreducible loops we need to transform. Note no SCCs share headers but nested SCCs will share interior blocks.

Single-entry SCCs are reducible loops and don't require any special processing as they can be emitted as Wasm lops. But multi-entry SCCs are irreducible loops and must be transformed.

So we transform each multi-emtry SCC (working inner to outer) by creating a per-SCC control var and dispatch block. Each SCC header is assigned an index from 0...N-1, where N is the number of headers in that SCC. The dispatch block switches to each the headers based on their index and the control var. Each pre-existing edge to the header is then logically split and the index var is assigned the index for that header and retargeted to the dispatch node. As an optimization and to handle some unsplittable edges, if an SCC header's pred has the header as its only successor, we put the control var assignment into the pred instead of splitting the edge.

This transforms each multi-entry SCC into a single-entry reducible loop. In checked builds we verify by rerunning the DFS and assert that there are no longer any improper headers.

Note there are other strategies for resolving SCCs into reducible loops that might offer better performance; we are intentionally picking something simple.

Defer handling cases where the original DFS found non-funclet blocks that could only be reached via EH, as we do not yet have a way of describing how Wasm control can reach such blocks. We will revisit this once we have the Wasm EH model design in place. Such cases are fairly rare (eg a try/catch that ends with a goto or return).

We currently run the SCC transform before lower to allow lower the chance to optimize the switch and because we introduce new IR. There is a risk that a sufficiently clever later phase (say one that could do block cloning or jump threading) might undo the dispatch structure and recreate an irreducible loop, but that doesn't seem to happen. The subsequent Wasm control flow phase will also assert that its run of Wasm DFS does not have any improper headers.

Continuation of #120534.

Contributes to #121178.

Determine how to emit Wasm control flow from the JIT's control flow graph. Relies on loop-aware RPO to determine the block order. Currently only handles the main method. Assumes irreducible loops have been fixed upstream (which is not yet guaranteed; bails out if not so). Doesn't actually do any emission, just prints a textual description in the JIT dump (along with a dot markup version). Uses only LOOP and BLOCK. Tries to limit the extent of BLOCK. Run for now as an optional phase even if not targeting Wasm, to do some stress testing. Contributes to dotnet#121178

Co-authored-by: Copilot <[email protected]>

Loops for Wasm control flow codegen don't involve EH or runtime mediated control flow transfers. Implement a custom block successor enumerator for Wasm, and adjust `fgRunDFS` to allow using this and also to generalize how the DFS is initiated. Use this to build a "Wasm" DFS. In that DFS handle both the main method and all funclets (by specifying funclet entries as additional DFS starting points). Update the loop finding code to make suitable changes when it is driven from a "Wasm" DFS instead of the typical all successor / all predecessor DFS. Remove the restriction in the Wasm control flow codegen that only handles the main method; now it works for the main method and all funclets. Contributes to dotnet#121178.

…nsform to handle catchret better

src/coreclr/jit/fgwasm.cpp

src/coreclr/jit/fgwasm.h

src/coreclr/jit/jiteh.cpp

kg · 2025-11-19T02:34:43Z

The parts I understand LGTM

src/coreclr/jit/fgwasm.cpp

AndyAyersMS · 2025-11-21T17:14:40Z

@dotnet/jit-contrib any other comments?

If not, I need one of you to approve this.

jakobbotsch · 2025-11-25T11:39:43Z

src/coreclr/jit/compiler.cpp

+#ifdef DEBUG
+    // If we are going to simulate generating wasm control flow,
+    // transform any strongly connected components into reducible flow.
+    //
+    if (JitConfig.JitWasmControlFlow() > 0)
+    {
+        DoPhase(this, PHASE_DFS_BLOCKS_WASM, &Compiler::fgDfsBlocksAndRemove);
+        DoPhase(this, PHASE_WASM_TRANSFORM_SCCS, &Compiler::fgWasmTransformSccs);
+    }
+#endif


Is it intentional this was placed before PHASE_ASYNC, even though it is eventually going to have to be after?

Not intentional. It should be moved to just after the async transformation.

Fixed in #121973

jakobbotsch · 2025-11-25T11:48:27Z

src/coreclr/jit/fgwasm.cpp

+                if (BitVecOps::IsMember(m_traits, m_blocks, pred->bbPostorderNum))
+                {
+                    // Pred is in the scc, so not an entry edge
+                    continue;
+                }


Is it possible that we see predecessors here that won't be in the DFS tree such that pred->bbPostorderNum is something undefined? Should this be guarded on that?

We run a DFS+Remove pass just before, but being defensive here can't hurt. Let me look into this.

jakobbotsch · 2025-11-25T11:52:03Z

src/coreclr/jit/fgwasm.cpp

+        // Dump subgraph as dot
+        {
+            JITDUMP("digraph scc_%u_nested_subgraph%u {\n", m_num, nestedCount);
+            BitVecOps::Iter iterator(m_traits, nestedBlocks);
+            unsigned int    poNum;
+            bool            first = true;
+            while (iterator.NextElem(&poNum))
+            {
+                BasicBlock* const block = m_dfsTree->GetPostOrder(poNum);
+
+                JITDUMP(FMT_BB ";\n", block->bbNum);
+
+                WasmSuccessorEnumerator successors(m_comp, block, /* useProfile */ true);
+                for (BasicBlock* const succ : successors)
+                {
+                    JITDUMP(FMT_BB " -> " FMT_BB ";\n", block->bbNum, succ->bbNum);
+                }
+            }
+
+            JITDUMP("}\n");
+        }


Put this under #ifdef DEBUG and if (verbose)?

Fixed in #121973.

jakobbotsch · 2025-11-25T11:58:54Z

src/coreclr/jit/fgwasm.cpp

+    // TODO: if we had a BV iter that worked from highest set
+    // bit to lowest, we could iterate the subset directly
+    // and avoid searching here.


You can use BitVecOps::VisitBitsReverse for this.

In fact I would suggest switching away from the BV iter in most places here and unify the interface of Scc with FlowGraphNaturalLoop.

jakobbotsch · 2025-11-25T12:01:19Z

src/coreclr/jit/fgwasm.cpp

+    if (sccs.Height() > 0)
+    {
+        for (int i = 0; i < sccs.Height(); i++)
+        {
+            Scc* const scc = sccs.Bottom(i);
+            scc->Finalize();
+        }
+    }


The emptiness check looks unnecessary.

jakobbotsch · 2025-11-25T12:06:16Z

src/coreclr/jit/fgwasm.cpp

+//
+void FgWasm::WasmFindSccsCore(BitVec& subset, ArrayStack<Scc*>& sccs, BasicBlock** postorder, unsigned postorderCount)
+{
+    SccMap              map(Comp()->getAllocator(CMK_WasmSccTransform));


Will this map be sparse, or could it just be a flat map indexed by the postorder indices, since that mapping is dense?

Initially this will need to cover the entire method, so I think we can create a flat array for that and then reuse it for subsequent subset cases.

jakobbotsch · 2025-11-25T12:08:00Z

src/coreclr/jit/fgwasm.h

+        for (BasicBlock* const pred : block->PredBlocks())
        {
-            advance();
+            hasPred = true;
+            if (!BitVecOps::IsMember(&m_traits, subgraph, pred->bbPostorderNum))


Similarly, does this need to be guarded on m_dfsTree->Contains(pred)?

AndyAyersMS and others added 24 commits November 6, 2025 11:34

fix release build

ca2c9b4

fix llvm build

39e0bd3

Fix typos

e8882ab

Co-authored-by: Copilot <[email protected]>

Fix more typos

890548e

fixes

4c97a19

first cut at integrating SCCs

3b3a13d

transform sccs pre-lower

be764e3

handle blocks only reachable by EH

39f4793

format

f1e5dcc

remove unreachable first

bc53647

fix dump

ff1d0ac

merge main

587f012

fixup after merge

9ec1465

add fgwasm.h to wasm headers

ab5c6f1

merge main

80c3311

bail out if there are blocks only reachable by EH. Adjust the scc tra…

b2b865f

…nsform to handle catchret better

introduce FgWasm class

3846d86

Puzzled

e413382

give FgWasm some state

488e148

no more recursive lambda

0c382da

start to encapsulate dfs

cf4e92d

encapsulate dfs and traits

b25a32a

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 18, 2025

dotnet-policy-service bot assigned AndyAyersMS Nov 18, 2025

AndyAyersMS mentioned this pull request Nov 18, 2025

[Wasm RyuJIT] Implement Wasm DFS and Loop Finding #121457

Merged

am11 added the arch-wasm WebAssembly architecture label Nov 18, 2025

AndyAyersMS marked this pull request as ready for review November 18, 2025 16:32

Copilot AI review requested due to automatic review settings November 18, 2025 16:32