Skip to content
Closed
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ class AsmOffsets
// Debug build offsets
#if TARGET_AMD64
#if TARGET_UNIX
public const int SIZEOF__REGDISPLAY = 0x1b90;
public const int SIZEOF__REGDISPLAY = 0x1c10;
public const int OFFSETOF__REGDISPLAY__SP = 0x1b78;
public const int OFFSETOF__REGDISPLAY__ControlPC = 0x1b80;
#else // TARGET_UNIX
Expand Down Expand Up @@ -82,7 +82,7 @@ class AsmOffsets
// Release build offsets
#if TARGET_AMD64
#if TARGET_UNIX
public const int SIZEOF__REGDISPLAY = 0x1b80;
public const int SIZEOF__REGDISPLAY = 0x1c00;
public const int OFFSETOF__REGDISPLAY__SP = 0x1b70;
public const int OFFSETOF__REGDISPLAY__ControlPC = 0x1b78;
#else // TARGET_UNIX
Expand Down
51 changes: 49 additions & 2 deletions src/coreclr/gcinfo/gcinfodumper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,28 @@ BOOL GcInfoDumper::ReportPointerRecord (
REG(r13, R13),
REG(r14, R14),
REG(r15, R15),
#if defined(TARGET_UNIX)
#undef REG
#define REG(reg, field) { offsetof(Amd64VolatileContextPointer, field) }
REG(r16, R16),
REG(r17, R17),
REG(r18, R18),
REG(r19, R19),
REG(r20, R20),
REG(r21, R21),
REG(r22, R22),
REG(r23, R23),
REG(r24, R24),
REG(r25, R25),
REG(r26, R26),
REG(r27, R27),
REG(r28, R28),
REG(r29, R29),
REG(r30, R30),
REG(r31, R31),
REG(r16, R16),
REG(r16, R16),
#endif // TARGET_UNIX
#elif defined(TARGET_ARM)
#undef REG
#define REG(reg, field) { offsetof(ArmVolatileContextPointer, field) }
Expand Down Expand Up @@ -294,7 +316,7 @@ PORTABILITY_ASSERT("GcInfoDumper::ReportPointerRecord is not implemented on this

#if defined(TARGET_ARM) || defined(TARGET_ARM64) || defined(TARGET_RISCV64) || defined(TARGET_LOONGARCH64)
BYTE* pContext = (BYTE*)&(pRD->volatileCurrContextPointers);
#else
#else // TARGET_ARM || TARGET_ARM64 || TARGET_RISCV64 || TARGET_LOONGARCH64
BYTE* pContext = (BYTE*)pRD->pCurrentContext;
#endif

Expand Down Expand Up @@ -390,7 +412,12 @@ PORTABILITY_ASSERT("GcInfoDumper::ReportPointerRecord is not implemented on this
{
continue;
}
#endif
#elif defined(TARGET_AMD64) && defined(TARGET_UNIX)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this part need to be UNIX specific? Isn't this equally applicable to Windows and we just won't be using it until the OS feature detection is available?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the limitations right now. We discussed this briefly during out meeting. We have to wait for widows to expose APX EGPR context. Without it, we cannot enable GC pathways for windows since windows CONTEXT would not have those registers exposed. Linux uses the CONTEXT defined in pal.h where as for windows it uses the one defined in `winnt.h' in the SDK file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure I'm understanding. The Win32 CONTEXT struct never has any extended registers as part of it (no YMM, no ZMM, no KMASK, etc).

There is "hidden" extra data you can query for by setting CONTEXT_XSTATE and then further setting specific XSTATE bits such as XSTATE_MASK_AVX, XSTATE_MASK_AVX512, etc. This extra data is laid out mirroring what XSAVE/XRSTR return and the XSTATE_MASK_* flags directly map to the requested-feature bitmap bits from XCR0 for performance reasons, but I don't believe is strictly guaranteed to be this way (even if unlikely to change) and so we cannot pre-code such support.

Because Windows hasn't defined XSTATE_MASK_APX yet and we cannot guarantee it will match XCR0 (specifically bit 19), we cannot enable this for Windows. However, we should be able to reasonably setup everything so that it is essentially a 1-liner to enable. Namely all areas of the JIT currently use the same minipal_getcpufeatures function and cache them. This includes vxsort in the GC, NativeAOT, the general VM, etc.

So, my thought is most of these paths shouldn't be unconditionally doing APX handling, they should rather be doing something like if (IsApxSupported()) { /* handle extra 16 regs */ }. We can then have UNIX return true if APX has been enabled (there's no reason to expend the cycles doing save/restore for these areas otherwise) and we can have WINDOWS statically return false.

This ensures that all of the handling only requires us to touch the IsApxSupported() function (which is likely just reading the cached InstructionSet_* flag) when Windows does define the relevant bits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the response. I am working on making the above changes. To sum up -

We use the cpufeatures cached by GC and only run EGPR handling when APX is enabled. Furthermore, we disable APX support for windows in IsAPXSupported and hence making it a 1 liner change for windows.

if (ctx != 0 && iEncodedReg > 15)
{
break;
}
#endif // TARGET_AMD64 || TARGET_UNIX
{
_ASSERTE(iReg < nCONTEXTRegisters);
#ifdef TARGET_ARM
Expand All @@ -414,6 +441,19 @@ PORTABILITY_ASSERT("GcInfoDumper::ReportPointerRecord is not implemented on this
{
pReg = (SIZE_T*)((BYTE*)pRD->pCurrentContext + rgRegisters[iReg].cbContextOffset);
}
#elif defined(TARGET_AMD64) && defined(TARGET_UNIX)
if (ctx == 0 && iReg == 16)
{
pContext = (BYTE*)&(pRD->volatileCurrContextPointers);
}
if (ctx == 0 && iReg >= 16)
{
pReg = *(SIZE_T**)(pContext + rgRegisters[iReg].cbContextOffset);
}
else
{
pReg = (SIZE_T*)(pContext + rgRegisters[iReg].cbContextOffset);
}
#else
pReg = (SIZE_T*)(pContext + rgRegisters[iReg].cbContextOffset);
#endif
Expand Down Expand Up @@ -664,6 +704,13 @@ GcInfoDumper::EnumerateStateChangesResults GcInfoDumper::EnumerateStateChanges (
*(ppCurrentRax + iReg) = &regdisp.pCurrentContext->Rax + iReg;
*(ppCallerRax + iReg) = &regdisp.pCallerContext ->Rax + iReg;
}
#if defined(TARGET_UNIX)
ULONG64 **ppVolatileReg = &regdisp.volatileCurrContextPointers.R16;
for (iReg = 0; iReg < 16; iReg++)
{
*(ppVolatileReg+iReg) = &regdisp.pCurrentContext->R16 + iReg;
}
#endif // TARGET_UNIX
#elif defined(TARGET_ARM)
FILL_REGS(pCurrentContext->R0, 16);
FILL_REGS(pCallerContext->R0, 16);
Expand Down
47 changes: 46 additions & 1 deletion src/coreclr/inc/regdisp.h
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,33 @@ typedef struct _Arm64VolatileContextPointer
} Arm64VolatileContextPointer;
#endif //TARGET_ARM64

#if defined(TARGET_AMD64) && defined(TARGET_UNIX)
typedef struct _Amd64VolatileContextPointer
{
union {
struct {
PDWORD64 R16;
PDWORD64 R17;
PDWORD64 R18;
PDWORD64 R19;
PDWORD64 R20;
PDWORD64 R21;
PDWORD64 R22;
PDWORD64 R23;
PDWORD64 R24;
PDWORD64 R25;
PDWORD64 R26;
PDWORD64 R27;
PDWORD64 R28;
PDWORD64 R29;
PDWORD64 R30;
PDWORD64 R31;
};
PDWORD64 R[16];
};
} Amd64VolatileContextPointer;
#endif //TARGET_AMD64 && TARGET_UNIX

#if defined(TARGET_LOONGARCH64)
typedef struct _LoongArch64VolatileContextPointer
{
Expand Down Expand Up @@ -253,6 +280,10 @@ struct REGDISPLAY : public REGDISPLAY_BASE {
LoongArch64VolatileContextPointer volatileCurrContextPointers;
#endif

#if defined(TARGET_AMD64) && defined(TARGET_UNIX)
Amd64VolatileContextPointer volatileCurrContextPointers;
#endif

#ifdef TARGET_RISCV64
RiscV64VolatileContextPointer volatileCurrContextPointers;
#endif
Expand Down Expand Up @@ -563,7 +594,11 @@ inline void FillRegDisplay(const PREGDISPLAY pRD, PT_CONTEXT pctx, PT_CONTEXT pC
// Fill volatile context pointers. They can be used by GC in the case of the leaf frame
for (int i=0; i < 18; i++)
pRD->volatileCurrContextPointers.X[i] = &pctx->X[i];
#elif defined(TARGET_LOONGARCH64) // TARGET_ARM64
#elif defined(TARGET_AMD64) && defined(TARGET_UNIX) && defined(HOST_UNIX) // TARGET_ARM64
// Fill volatile context pointers. They can be used by GC in the case of the leaf frame
for (int i=0; i < 16; i++)
pRD->volatileCurrContextPointers.R[i] = &pctx->R[i];
#elif defined(TARGET_LOONGARCH64) // TARGET_ADM64 && TARGET_UNIX && HOST_UNIX
pRD->volatileCurrContextPointers.A0 = &pctx->A0;
pRD->volatileCurrContextPointers.A1 = &pctx->A1;
pRD->volatileCurrContextPointers.A2 = &pctx->A2;
Expand Down Expand Up @@ -663,6 +698,16 @@ inline size_t * getRegAddr (unsigned regNum, PTR_CONTEXT regs)
};

return (PTR_size_t)(PTR_BYTE(regs) + OFFSET_OF_REGISTERS[regNum]);
#elif defined(TARGET_AMD64) && defined(TARGET_UNIX)
_ASSERTE(regNum < 32);
if (regNum < 16)
{
return (size_t *)&regs->Rax + regNum;
}
else
{
return (size_t *)&regs->R16 + (regNum - 16);
}
#elif defined(TARGET_AMD64)
_ASSERTE(regNum < 16);
return (size_t *)&regs->Rax + regNum;
Expand Down
14 changes: 7 additions & 7 deletions src/coreclr/nativeaot/Runtime/amd64/AsmOffsetsCpu.h
Original file line number Diff line number Diff line change
Expand Up @@ -73,28 +73,28 @@ PLAT_ASM_OFFSET(90, REGDISPLAY, Xmm)

#else // !UNIX_AMD64_ABI

PLAT_ASM_SIZEOF(190, ExInfo)
PLAT_ASM_SIZEOF(210, ExInfo)
PLAT_ASM_OFFSET(0, ExInfo, m_pPrevExInfo)
PLAT_ASM_OFFSET(8, ExInfo, m_pExContext)
PLAT_ASM_OFFSET(10, ExInfo, m_exception)
PLAT_ASM_OFFSET(18, ExInfo, m_kind)
PLAT_ASM_OFFSET(19, ExInfo, m_passNumber)
PLAT_ASM_OFFSET(1c, ExInfo, m_idxCurClause)
PLAT_ASM_OFFSET(20, ExInfo, m_frameIter)
PLAT_ASM_OFFSET(188, ExInfo, m_notifyDebuggerSP)
PLAT_ASM_OFFSET(208, ExInfo, m_notifyDebuggerSP)

PLAT_ASM_OFFSET(0, PInvokeTransitionFrame, m_RIP)
PLAT_ASM_OFFSET(8, PInvokeTransitionFrame, m_FramePointer)
PLAT_ASM_OFFSET(10, PInvokeTransitionFrame, m_pThread)
PLAT_ASM_OFFSET(18, PInvokeTransitionFrame, m_Flags)
PLAT_ASM_OFFSET(20, PInvokeTransitionFrame, m_PreservedRegs)

PLAT_ASM_SIZEOF(168, StackFrameIterator)
PLAT_ASM_SIZEOF(1e8, StackFrameIterator)
PLAT_ASM_OFFSET(10, StackFrameIterator, m_FramePointer)
PLAT_ASM_OFFSET(18, StackFrameIterator, m_ControlPC)
PLAT_ASM_OFFSET(20, StackFrameIterator, m_RegDisplay)
PLAT_ASM_OFFSET(158, StackFrameIterator, m_OriginalControlPC)
PLAT_ASM_OFFSET(160, StackFrameIterator, m_pPreviousTransitionFrame)
PLAT_ASM_OFFSET(1d8, StackFrameIterator, m_OriginalControlPC)
PLAT_ASM_OFFSET(1e0, StackFrameIterator, m_pPreviousTransitionFrame)

PLAT_ASM_SIZEOF(50, PAL_LIMITED_CONTEXT)
PLAT_ASM_OFFSET(0, PAL_LIMITED_CONTEXT, IP)
Expand All @@ -110,8 +110,8 @@ PLAT_ASM_OFFSET(38, PAL_LIMITED_CONTEXT, R13)
PLAT_ASM_OFFSET(40, PAL_LIMITED_CONTEXT, R14)
PLAT_ASM_OFFSET(48, PAL_LIMITED_CONTEXT, R15)

PLAT_ASM_SIZEOF(88, REGDISPLAY)
PLAT_ASM_OFFSET(78, REGDISPLAY, SP)
PLAT_ASM_SIZEOF(108, REGDISPLAY)
PLAT_ASM_OFFSET(f8, REGDISPLAY, SP)

PLAT_ASM_OFFSET(18, REGDISPLAY, pRbx)
PLAT_ASM_OFFSET(20, REGDISPLAY, pRbp)
Expand Down
18 changes: 18 additions & 0 deletions src/coreclr/nativeaot/Runtime/regdisplay.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,24 @@ struct REGDISPLAY
PTR_uintptr_t pR13;
PTR_uintptr_t pR14;
PTR_uintptr_t pR15;
#if defined(TARGET_UNIX)
PTR_uintptr_t pR16;
PTR_uintptr_t pR17;
PTR_uintptr_t pR18;
PTR_uintptr_t pR19;
PTR_uintptr_t pR20;
PTR_uintptr_t pR21;
PTR_uintptr_t pR22;
PTR_uintptr_t pR23;
PTR_uintptr_t pR24;
PTR_uintptr_t pR25;
PTR_uintptr_t pR26;
PTR_uintptr_t pR27;
PTR_uintptr_t pR28;
PTR_uintptr_t pR29;
PTR_uintptr_t pR30;
PTR_uintptr_t pR31;
#endif //TARGET_UNIX
#endif // TARGET_AMD64

uintptr_t SP;
Expand Down
39 changes: 22 additions & 17 deletions src/coreclr/pal/inc/pal.h
Original file line number Diff line number Diff line change
Expand Up @@ -1464,24 +1464,29 @@ typedef struct DECLSPEC_ALIGN(16) _CONTEXT {
M512 Zmm31;
};

struct
// XSTATE_APX
union
{
DWORD64 R16;
DWORD64 R17;
DWORD64 R18;
DWORD64 R19;
DWORD64 R20;
DWORD64 R21;
DWORD64 R22;
DWORD64 R23;
DWORD64 R24;
DWORD64 R25;
DWORD64 R26;
DWORD64 R27;
DWORD64 R28;
DWORD64 R29;
DWORD64 R30;
DWORD64 R31;
struct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dotnet/dotnet-diag I believe CONTEXT structure is used by the debugger interfaces. Is this modification going to break them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot say for sure at the moment. It depends on the decision if we want the debugger to track these new EGPRs. Once the context structure gets finalized is when we can say with surety about what might get impacted. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, we have avoided making any kind of debugger changes since they would need windows OS to have XSTATE_APX support for extended CONTEXT. Once we have complete EGPR support, we can extend debugger support as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry missed this comment earlier when @jkotas first pinged. Generally the managed debugger needs to be aware of all registers because at various points it saves them and later restores them. If the debugger isn't aware of all the registers it needs to save then we can easily end up in situations where an app works without the debugger, but trying to step through the code in the debugger behaves unpredictably. For example here is a recent debugger bug where x86 floating point register capture/restore got inadvertently broken: https://devdiv.visualstudio.com/DevDiv/_workitems/edit/2428652

If we wanted to make this work available prior to having the debugger portions implemented and tested I think it would need to be explicitly an opt-in feature with warnings that debugging is unsupported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @noahfalk
So what needs to be done in order to make this an opt in feature? Do I need to open a new issue to make this a opt in feature only with warnings or is that handled by the debugger team after this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A common way we've done it in the past is to define an environment variable and then only use the new registers if the env var is set. Later on once it is fully supported we can switch the default value so the env var becomes an opt-out rather than opt-in.

Here is an example for AVX registers: https://github.com/dotnet/runtime/blob/main/src/coreclr/inc/clrconfigvalues.h#L673

I'd prefer if the opt-in mechanism was included in this PR so that we don't create any window of builds where debugging is broken.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noahfalk, The opt-in for APX already exists on L681 (DOTNET_EnableAPX) and it is already defaulted to off (same with AVX10v2) since the hardware isn't available yet.

We automatically add a config switch per ISA (or ISA group in some cases) when the detection logic is added to the VM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tannergooding! Glad to see its already in place. I'd also ask that anywhere we advertise the env var should make it clear managed debugging isn't supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍. We notably do not document these switches and rarely advertise them. They are considered advanced and primarily for people doing local testing, so they can validate downlevel hardware.

There will likely be a blog post that lets people know about the switch when hardware does become available (so not until after .NET 10 ships) and we can ensure any nuances, such as the GC or debugger experience not working end to end, are called out there.

{
DWORD64 R16;
DWORD64 R17;
DWORD64 R18;
DWORD64 R19;
DWORD64 R20;
DWORD64 R21;
DWORD64 R22;
DWORD64 R23;
DWORD64 R24;
DWORD64 R25;
DWORD64 R26;
DWORD64 R27;
DWORD64 R28;
DWORD64 R29;
DWORD64 R30;
DWORD64 R31;
};
DWORD64 R[16];
};

} CONTEXT, *PCONTEXT, *LPCONTEXT;
Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/unwinder/amd64/unwinder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ BOOL DacUnwindStackFrame(CONTEXT * pContext, KNONVOLATILE_CONTEXT_POINTERS* pCon

if (res && pContextPointers)
{
for (int i = 0; i < 16; i++)
for (int i = 0; i < 32; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be limited to only hardware/scenarios with APX enabled? This is going to double the loop iteration count otherwise.

Could this be a simply memcpy instead, taking in the count to copy? (it looks like a naive memcpy)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. That is the question I had asked offline. I will post it again here for reference -

Is it necessary to disable the processing of the extended registers if APX is not present on the system; if so, is there a similar mechanism to check in the VM/GC code if the APX ISA is enabled?

Yes. it can simply be a memcpy. Right now I have extended what was already available. We can change it to memcpy if required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary, but I do think it's desirable. Otherwise we're spending cycles and "pessimizing" save/restore for machines that aren't using or cannot use the functionality.

The VM, GC, and NativeAOT all have ways of checking for ISA support via minipal_getcpufeatures and have some location they cache this already:

coreclr/gc/vxsort/isa_detection.cpp:18:    int cpuFeatures = minipal_getcpufeatures();
coreclr/nativeaot/Runtime/startup.cpp:179:    g_cpuFeatures = minipal_getcpufeatures();
coreclr/tools/aot/jitinterface/jitwrapper.cpp:59:    return minipal_getcpufeatures();
coreclr/vm/codeman.cpp:1181:    int cpuFeatures = minipal_getcpufeatures();
native/minipal/cpufeatures.c:219:int minipal_getcpufeatures(void)
native/minipal/cpufeatures.h:66:int minipal_getcpufeatures(void);

It might be necessary to move the ISA detection slightly for the GC, but we should be able to generally get this so that all paths just need to check if (IsApxSupported()) { ... } (with that basically just being _cachedCpuFeatures & XArchIntrinsicConstants_Apx) != 0) prior to doing the work. Which should allow Windows to light up with a minimal near 1-line change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the response. So what you are saying is each of GC, VM and nativeAOT needs to have their own IsAPXSupported right? or do we want to have a common check in cpufeatures with cached values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will enable IsAPXSupported in VM and enable all changes for windows as well. Then ping back here for review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding 943308e enables some part of the GC changes for windows and add IsAPXSupported for VM. Is this what you were suggesting earlier? does this handling look right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks roughly what I was suggesting.

However, I'd likely expect the GC to cache the CPUID flags itself rather than repeatedly querying through the EEJitManager (it already caches them as part of vxsort, so I'd expect we just move that caching "up" to be part of GC initialization).

I'd also expect the logic in TGcInfoDecoder<GcInfoEncoding>::GetRegisterSlot to always happen and for us to assert that regNum >= 16 only occurs if IsApxSupported (otherwise its going to do the wrong thing and return invalid memory). The caller of GetRegisterSlot shouldn't be passing in invalid registers for the expected state.

Copy link
Member Author

@khushal1996 khushal1996 Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding can you elaborate a little bit more on what needs to be done and where things should be moved?

I had some questions since I am having a hard time figuring out what do you mean by moving the caching"up" the gc vxsort cpufeatures.

  1. Did you mean having 3 different IsAPXSupported{...} for GC, VM and nativeAOT or a have a single check somewhere in cpufeatures.c?

  2. How do we figure out what is the common code between all the engines?

  3. Is there a way to find out what is the code flow here or the sequence in which the things happen to understand this side of the .NET a little bit in more detail?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect we have a few separate IsApxSupported() functions, one for each area that requires it.

The logic for determining what features are supported is currently shared across all systems. This is the minipal_getcpufeatures function. Currently the JIT retrieves this from the VM via the JIT/EE interface, so that it can use InstructionSet_* checks. While the GC and NativeAOT each do their own call to minipal_getcpufeatures as they have some much more basic checks they need to do.

The VM makes the call in codemap.cpp and converts it to the CPUCompileFlags for the JIT/EE interface: https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/codeman.cpp#L1181

NAOT makes the call in startup.cpp and simply does a check against the g_requiredCpuFeatures set for the image to ensure the image is runnable on the host machine: https://github.com/dotnet/runtime/blob/main/src/coreclr/nativeaot/Runtime/startup.cpp#L179

The GC makes the call in vxsort/isa_detection.cpp and converts it to its own SupportedIsa flags enum, which it uses to determine if a given instruction set is supported so it knows if it needs to use the legacy algorithm, if it can use the AVX2 capable algorithm, or the AVX512 capable algorithm: https://github.com/dotnet/runtime/blob/main/src/coreclr/gc/vxsort/isa_detection.cpp

For the GC, this is only called if VXSORT is enabled, but is called as part of initialize_gc: https://github.com/dotnet/runtime/blob/main/src/coreclr/gc/gc.cpp#L14700-L14702

By moving it "up", I mean that I would expect the GC needs to change to always doing this ISA detection and caching it as part of the main GC info, rather than only doing it for vxsort. I would expect APX handling to be done behind this flag, where relevant, to ensure it isn't done unnecessarily (noting that some places just need to assert, as the caller shouldn't be passing down APX registers, in the first place, if APX isn't enabled)

Copy link
Member Author

@khushal1996 khushal1996 Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @tannergooding. I have moved the caching for cpuFeatures for each engine in d4cffc7
Hope this resolves the problem of doing unnecessary context processing when APX is unavailable. I working on making changes for calling windows APIs to get offset of APX XSAVE area for removing the need to reference R16 in other places.

{
*(&pContextPointers->Rax + i) = &pContext->Rax + i;
}
Expand Down
10 changes: 10 additions & 0 deletions src/coreclr/vm/amd64/cgenamd64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,11 @@ void ClearRegDisplayArgumentAndScratchRegisters(REGDISPLAY * pRD)
pContextPointers->R9 = NULL;
pContextPointers->R10 = NULL;
pContextPointers->R11 = NULL;

#if defined(TARGET_UNIX)
for (int i=0; i < 16; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question. Can/should we limit this to only scenarios with APX enabled?

Same question applies to all the other loops/handling for the extended registers here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notably this can also just be a memset call and avoid the loop so its easier for the compiler to optimize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question. Can/should we limit this to only scenarios with APX enabled?

Same question applies to all the other loops/handling for the extended registers here.

Same question as here https://github.com/dotnet/runtime/pull/116806/files#r2208598605

Notably this can also just be a memset call and avoid the loop so its easier for the compiler to optimize.

Yes this can be a memset. But then it would differ from how things are done under ARM. Current coding convention only extends what was already present.

pRD->volatileCurrContextPointers.R[i] = NULL;
#endif // TARGET_UNIX
}

void TransitionFrame::UpdateRegDisplay_Impl(const PREGDISPLAY pRD, bool updateFloats)
Expand Down Expand Up @@ -227,6 +232,11 @@ void ResumableFrame::UpdateRegDisplay_Impl(const PREGDISPLAY pRD, bool updateFlo
pRD->pCurrentContextPointers->R14 = &m_Regs->R14;
pRD->pCurrentContextPointers->R15 = &m_Regs->R15;

#if defined(TARGET_UNIX)
for (int i = 0; i < 16; i++)
pRD->volatileCurrContextPointers.R[i] = &m_Regs->R[i];
#endif // TARGET_UNIX

pRD->IsCallerContextValid = FALSE;
pRD->IsCallerSPValid = FALSE; // Don't add usage of this field. This is only temporary.

Expand Down
Loading
Loading