Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Nov 15, 2022

Fixes: #78300

public static void Test(int i)
{
    if (i == 0) throw Create();
}

static FormatException Create() => new FormatException(SR.Test);

class SR
{
    public static string Test => "";
}

Codegen for Test, was:

; Method P:Test(int)
G_M000_IG01:                
       56                   push     rsi
       4883EC20             sub      rsp, 32
G_M000_IG02:                
       85C9                 test     ecx, ecx
       7406                 je       SHORT G_M000_IG04
G_M000_IG03:                
       4883C420             add      rsp, 32
       5E                   pop      rsi
       C3                   ret      
G_M000_IG04:                
       48B9F8A9FF0DFA7F0000 mov      rcx, 0x7FFA0DFFA9F8
       E8A244AE5F           call     CORINFO_HELP_NEWSFAST
       488BF0               mov      rsi, rax
       B901000000           mov      ecx, 1
       48BA18B2EF0DFA7F0000 mov      rdx, 0x7FFA0DEFB218
       E84B74AB5F           call     CORINFO_HELP_STRCNS
       488BD0               mov      rdx, rax
       488BCE               mov      rcx, rsi
       FF15070B2300         call     [System.FormatException:.ctor(System.String):this]
       488BCE               mov      rcx, rsi
       E88750A15F           call     CORINFO_HELP_THROW
       CC                   int3     
; Total bytes of code: 74

Now:

; Method P:Test(int)
G_M11252_IG01:            
       4883EC28             sub      rsp, 40
G_M11252_IG02:              
       85C9                 test     ecx, ecx
       7405                 je       SHORT G_M11252_IG04
G_M11252_IG03:              ;; offset=0008H
       4883C428             add      rsp, 40
       C3                   ret      
G_M11252_IG04:            
       FF153D1C4500         call     [P:Create():System.FormatException]
       488BC8               mov      rcx, rax
       E835E84E5F           call     CORINFO_HELP_THROW
       CC                   int3     
; Total bytes of code: 28

Currently, we almost never inline anything in throw blocks above 16 bytes of IL, but it seems that it's a good idea to not inline small methods too:

Pros:

  1. inlining, even for small methods, is expensive (may lead to type load events for cold code we never execute) - I learned that from my "allow inlining in tier0" work.
  2. ^ snippet is a great example of code bloated because we inlined a 11 bytes of IL method inside throw
  3. Andy planned to not to compile BBJ_THROW at all via partial compilation

Cons:

  1. Might regress performance for exceptions by adding call overhead, likely not noticeably. We can handle this via Dynamic PGO (if a throw blocks is not cold according to profile)
  2. Sometimes inlining even in cold sections might improve overall type/memory/escape-analysis picture. I personally don't think it's the case here with Exception handling machinery.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 15, 2022
@ghost ghost assigned EgorBo Nov 15, 2022
@ghost
Copy link

ghost commented Nov 15, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes: #78300

public static void Test(int i)
{
    if (i == 0) throw Create();
}

static FormatException Create() => new FormatException(SR.Test);

class SR
{
    public static string Test => "";
}

Codegen for Test, was:

; Method P:Test(int)
G_M000_IG01:                
       56                   push     rsi
       4883EC20             sub      rsp, 32
G_M000_IG02:                
       85C9                 test     ecx, ecx
       7406                 je       SHORT G_M000_IG04
G_M000_IG03:                
       4883C420             add      rsp, 32
       5E                   pop      rsi
       C3                   ret      
G_M000_IG04:                
       48B9F8A9FF0DFA7F0000 mov      rcx, 0x7FFA0DFFA9F8
       E8A244AE5F           call     CORINFO_HELP_NEWSFAST
       488BF0               mov      rsi, rax
       B901000000           mov      ecx, 1
       48BA18B2EF0DFA7F0000 mov      rdx, 0x7FFA0DEFB218
       E84B74AB5F           call     CORINFO_HELP_STRCNS
       488BD0               mov      rdx, rax
       488BCE               mov      rcx, rsi
       FF15070B2300         call     [System.FormatException:.ctor(System.String):this]
       488BCE               mov      rcx, rsi
       E88750A15F           call     CORINFO_HELP_THROW
       CC                   int3     
; Total bytes of code: 74

Now:

; Method P:Test(int)
G_M11252_IG01:            
       4883EC28             sub      rsp, 40
G_M11252_IG02:              
       85C9                 test     ecx, ecx
       7405                 je       SHORT G_M11252_IG04
G_M11252_IG03:              ;; offset=0008H
       4883C428             add      rsp, 40
       C3                   ret      
G_M11252_IG04:            
       FF153D1C4500         call     [P:Create():System.FormatException]
       488BC8               mov      rcx, rax
       E835E84E5F           call     CORINFO_HELP_THROW
       CC                   int3     
; Total bytes of code: 28

Currently, we almost never inline anything in throw blocks above 16 bytes of IL, but it seems that it's a good idea to not inline small methods too:

Pros:

  1. inlining, even for small methods, is expensive (may lead to type load events for cold code we never execute)
  2. ^ snippet is a great example of code bloated because we inlined a 11 bytes of IL method inside throw
  3. Andy planned to not to compile BBJ_THROW at all via partial compilation

Cons:

  1. Might regress performance for exceptions by adding call overhead, likely not noticeably. We can handle this via Dynamic PGO (if a throw blocks is not cold according to profile)
  2. Sometimes inlining even in cold sections might improve overall type/memory/escape-analysis picture. I personally don't think it's the case here with Exception handling machinery.

Alternative fix is to reduce ALWAYS_INLINE il threshold from 16 to 8 for THROW sections, but we might regress jit's throughput.
Another fix is to only handle throw call pattern and leave everything as is.

Let's see the diffs.

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@stephentoub
Copy link
Member

stephentoub commented Nov 15, 2022

Just for reference, I noticed it while looking at the codegen for the not-yet-merged https://github.com/dotnet/runtime/pull/71590/files#diff-f7c1029305b8ff70a95e36e675c08e17c8f816c5a1be8a204dc42d0595129e18R1419.
(I made that helper NoInlining to work around it.)

@stephentoub
Copy link
Member

(And I suspect the diffs may be influenced by such NoInlining in use elsewhere to work around this, e.g. https://github.com/dotnet/runtime/blob/main/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/ThrowHelper.cs)

@EgorBo EgorBo marked this pull request as ready for review November 16, 2022 01:22
@EgorBo
Copy link
Member Author

EgorBo commented Nov 16, 2022

@dotnet/jit-contrib @jakobbotsch PTAL, Diffs: https://dev.azure.com/dnceng-public/public/_build/results?buildId=85004&view=ms.vss-build-web.run-extensions-tab

I made the PR less aggressive so now the number of size regressions is minimal. The current logic is to inline everything below 8 bytes of IL (e.g. properties) and immediately give up on everything larger where previously we had IL prescan. Number of regressions seem reasonable to me now.

@EgorBo EgorBo changed the title Don't inline anything in BBJ_THROW Don't inline >8 bytes of IL calls in BBJ_THROW Nov 16, 2022
Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in principle.

Any examples of the regressions? I can imagine not exposing some facts, even in the throwing BBs, can hurt codegen in the rest of the function. For example if we no longer inline a function with an argument in S someStruct then that could result in new address exposure.

@jakobbotsch
Copy link
Member

Maybe one common case would be instance functions on structs? e.g.:

void M(S s)
{
  if (s.Something())
  {
    throw new ArgumentException("s.Something is " + s.Something());
  }
}

where S is a struct.

@EgorBo
Copy link
Member Author

EgorBo commented Nov 16, 2022

@jakobbotsch I was not able to find any case like that in the diffs. What I did find is the cases like these:

image

I tried to adjust the heuristic to not do that for methods returning structs by value but it hid many previous improvements. Overall the number of regressions is "big" only in PMI collections:

image

From the same methods with different generic types (pmi)

@jakobbotsch
Copy link
Member

How come there's no diffs in the crossgen2 collections?

@EgorBo
Copy link
Member Author

EgorBo commented Nov 16, 2022

How come there's no diffs in the crossgen2 collections?

crossgen2 uses more conservative inliner while I did my changes in ExtendedDefaultPolicy.
I can move them into the baseclass so crossgen will also pick it up.

@EgorBo
Copy link
Member Author

EgorBo commented Nov 16, 2022

image

new diffs with crossgen2 included, thanks for the suggestion

@EgorBo EgorBo merged commit 7d186d6 into dotnet:main Nov 16, 2022
@EgorBo EgorBo deleted the dont-inline-in-throw-blocks branch November 16, 2022 21:40
@BruceForstall
Copy link
Contributor

BruceForstall commented Nov 17, 2022

Seems like the JIT shouldn't inline any function that is straight-line code leading to a throw (or more generally, where the IL entrypoint is post-dominated by a throw). Maybe we don't have that information, or it would be expensive to compute.

[Edit] I slightly misread the issue, as the throw is in the caller, not callee. So why do we inline any functions at all in this case? the throw is presumably going to dominate any performance.

@EgorBo
Copy link
Member Author

EgorBo commented Nov 18, 2022

[Edit] I slightly misread the issue, as the throw is in the caller, not callee. So why do we inline any functions at all in this case? the throw is presumably going to dominate any performance.

The initial change in this PR did exactly that but that produced a lot of size regressions and many examples actually affected non-cold path too (e.g. more registers were saved) so I changed it to "Ok, inline only under 8 bytes of IL", mostly for various properties (getters/setters), I tried to play with that threshold and 8 (actually even 9) was the best value.

@BruceForstall
Copy link
Contributor

I guess the size regression was because the call overhead was larger than the generated code for very small functions.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Throw helper methods that just construct exceptions get inlined

4 participants