Skip to content

Conversation

@clamp03
Copy link
Member

@clamp03 clamp03 commented Oct 14, 2025

Enable interpreter for arm32 softfp.

  • Implement Assemblies for args and return values
  • Fix some minor bugs
  • Tested with simple test cases

@clamp03 clamp03 self-assigned this Oct 14, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 14, 2025
@am11 am11 requested a review from janvorli November 28, 2025 08:00
}
else if ((m_currentRoutineType == RoutineType::FPReg) && (type != RoutineType::FPReg))
#ifndef TARGET_ARM
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this ifdef is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be multiple pending ranges to handle. ex. A struct can pass some of them by registers and remains by stacks. So it needs to check all ranges in some cases like example in #120140.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But single routine handles only a stack or registers. For the case you have described, there would need to be two routines. In the current method, this added ifdef seems to have no effect. If the m_currentRoutineType was RoutineType::GPReg, then it would not enter the if below with or without the else. And if the m_currentRoutineType was RoutineType::FPReg, then again, the else would have no effect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the case when a struct is split between registers and stack also exists on x64 Linux. The ArgIterator has the GetArgLocDescForStructInRegs() for this purpose. The code in the ComputeCallStub uses that to pretend the parts of the struct are "separate arguments" from the further processing point of view. The stuff around it is quite complex due to the complicated classification mechanism defined by the Unix amd64 ABI, but it seems we could handle the case of args split between regs and stack at that place too and I think we would not have to handle that here then.

#endif // TARGET_ARM64
else if ((m_currentRoutineType == RoutineType::Stack) && (type != RoutineType::Stack))
#ifndef TARGET_ARM
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here

// Use the NOINLINE to ensure that the InlinedCallFrame in this method is a lower stack address than any InterpMethodContextFrame values.
NOINLINE
void InvokeUnmanagedMethodWithTransition(MethodDesc *targetMethod, int8_t *stack, InterpMethodContextFrame *pFrame, int8_t *pArgs, int8_t *pRet, PCODE callTarget)
void InvokeUnmanagedMethodWithTransition(UnmanagedMethodWithTransitionParam *pParam)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a brief comment explaining why we are passing the args via this struct instead of directly so that people can understand the reason in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment. If it isn't good enough, I will update again. Thank you.

@clamp03
Copy link
Member Author

clamp03 commented Dec 10, 2025

Merged Upstream branch and fixed some errors.
However, I didn't update to support async. I will add async support after this PR is merged.
Thank you.

}
else if ((m_currentRoutineType == RoutineType::FPReg) && (type != RoutineType::FPReg))
#ifndef TARGET_ARM
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But single routine handles only a stack or registers. For the case you have described, there would need to be two routines. In the current method, this added ifdef seems to have no effect. If the m_currentRoutineType was RoutineType::GPReg, then it would not enter the if below with or without the else. And if the m_currentRoutineType was RoutineType::FPReg, then again, the else would have no effect.

#ifdef TARGET_ARM
if (argLocDesc.m_cGenReg == 2 || argLocDesc.m_byteStackSize >= 8)
{
/* do nothing */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please help me understand why arguments with 2 registers or 8 or more bytes passed on stack don't get any routine classification here?
The goal of this part of the code together with the TerminateCurrentRoutineIfNotOfNewType call is to identify when we cannot extend the range of registers / stack that we copy using a single routine. So with this change, any time we get an argument that is passed in two regs or an argument that is passed on stack with 8 or more bytes, we would not extend the previous range of registers / stack and I am not sure why.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But single routine handles only a stack or registers. For the case you have described, there would need to be two routines. In the current method, this added ifdef seems to have no effect. If the m_currentRoutineType was RoutineType::GPReg, then it would not enter the if below with or without the else. And if the m_currentRoutineType was RoutineType::FPReg, then again, the else would have no effect.

I am so sorry. When I have implemented it in the first time, removing else statements was valid . After the TerminateCurrentRoutineIfNotOfNewType function is introduced, I didn't read codes carefully.I will check GetArgLocDescForStructInRegs which you mentioned in another comment.

Can you please help me understand why arguments with 2 registers or 8 or more bytes passed on stack don't get any routine classification here?
The goal of this part of the code together with the TerminateCurrentRoutineIfNotOfNewType call is to identify when we cannot extend the range of registers / stack that we copy using a single routine. So with this change, any time we get an argument that is passed in two regs or an argument that is passed on stack with 8 or more bytes, we would not extend the previous range of registers / stack and I am not sure why.

I intended to block to extend of the previous ranges for long bytes and to enable to extend 4-size single register / stack. Because for 4-byte arg and 8-byte arg are handled differently. For 4 byte, it load a single interp stack to a register or stack. For 8 bytes, it load a single interp stack to two registers or stacks. If 4-byte and 8-byte args are mixed in a range, I think it is hard to identify how to handle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the def in TerminateCurrentRoutineIfNotOfNewType. And for arg which has both reg and stack, execute ProcessArgument for each. Could you review again?

}
else if ((m_currentRoutineType == RoutineType::FPReg) && (type != RoutineType::FPReg))
#ifndef TARGET_ARM
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the case when a struct is split between registers and stack also exists on x64 Linux. The ArgIterator has the GetArgLocDescForStructInRegs() for this purpose. The code in the ComputeCallStub uses that to pretend the parts of the struct are "separate arguments" from the further processing point of view. The stuff around it is quite complex due to the complicated classification mechanism defined by the Unix amd64 ABI, but it seems we could handle the case of args split between regs and stack at that place too and I think we would not have to handle that here then.

#ifdef TARGET_ARM
if (argLocDesc.m_cGenReg == 2)
{
pRoutines[m_routineIndex++] = GetRegRoutine_4B(argLocDesc.m_idxGenReg, argLocDesc.m_idxGenReg + argLocDesc.m_cGenReg - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered / tried changing the interpreter stack so that on arm32, the slot size would be 32 bit? It seems it would then eliminate the need for these special _4B variants. However, I am not sure if there are any blockers / issues with changing the stack slot size. I've asked in a private discussion why Mono doesn't use 32 bit interpreter stack on arm32 and the answer was that there was no reason for trying to do that there.

Copy link
Member Author

@clamp03 clamp03 Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered / tried changing the interpreter stack so that on arm32, the slot size would be 32 bit? It seems it would then eliminate the need for these special _4B variants. However, I am not sure if there are any blockers / issues with changing the stack slot size. I've asked in a private discussion why Mono doesn't use 32 bit interpreter stack on arm32 and the answer was that there was no reason for trying to do that there.

At first, I implemented with 32 bit slot size and the tests passed because I thought 32 bit slot size is better in 32 bit arch. After #120688 (comment), I changed it to 64 bit. I am not sure which is better for .NET runtime interpreter. Could you please let me know your idea?
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-Interpreter-coreclr area-VM-coreclr community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants