-
Notifications
You must be signed in to change notification settings - Fork 5.3k
[clr-interp] Interpreter for ARM32 SOFTFP #120688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
However, it crashes right after HellWorld print.
Need to test the implementations
This reverts commit b3ba6ec.
src/coreclr/vm/callstubgenerator.cpp
Outdated
| } | ||
| else if ((m_currentRoutineType == RoutineType::FPReg) && (type != RoutineType::FPReg)) | ||
| #ifndef TARGET_ARM | ||
| else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this ifdef is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be multiple pending ranges to handle. ex. A struct can pass some of them by registers and remains by stacks. So it needs to check all ranges in some cases like example in #120140.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But single routine handles only a stack or registers. For the case you have described, there would need to be two routines. In the current method, this added ifdef seems to have no effect. If the m_currentRoutineType was RoutineType::GPReg, then it would not enter the if below with or without the else. And if the m_currentRoutineType was RoutineType::FPReg, then again, the else would have no effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the case when a struct is split between registers and stack also exists on x64 Linux. The ArgIterator has the GetArgLocDescForStructInRegs() for this purpose. The code in the ComputeCallStub uses that to pretend the parts of the struct are "separate arguments" from the further processing point of view. The stuff around it is quite complex due to the complicated classification mechanism defined by the Unix amd64 ABI, but it seems we could handle the case of args split between regs and stack at that place too and I think we would not have to handle that here then.
src/coreclr/vm/callstubgenerator.cpp
Outdated
| #endif // TARGET_ARM64 | ||
| else if ((m_currentRoutineType == RoutineType::Stack) && (type != RoutineType::Stack)) | ||
| #ifndef TARGET_ARM | ||
| else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here
| // Use the NOINLINE to ensure that the InlinedCallFrame in this method is a lower stack address than any InterpMethodContextFrame values. | ||
| NOINLINE | ||
| void InvokeUnmanagedMethodWithTransition(MethodDesc *targetMethod, int8_t *stack, InterpMethodContextFrame *pFrame, int8_t *pArgs, int8_t *pRet, PCODE callTarget) | ||
| void InvokeUnmanagedMethodWithTransition(UnmanagedMethodWithTransitionParam *pParam) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a brief comment explaining why we are passing the args via this struct instead of directly so that people can understand the reason in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment. If it isn't good enough, I will update again. Thank you.
- Because of async support, some methods need arguments more than 4. I changed arguments passing. - Fixed some build errors.
|
Merged Upstream branch and fixed some errors. |
src/coreclr/vm/callstubgenerator.cpp
Outdated
| } | ||
| else if ((m_currentRoutineType == RoutineType::FPReg) && (type != RoutineType::FPReg)) | ||
| #ifndef TARGET_ARM | ||
| else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But single routine handles only a stack or registers. For the case you have described, there would need to be two routines. In the current method, this added ifdef seems to have no effect. If the m_currentRoutineType was RoutineType::GPReg, then it would not enter the if below with or without the else. And if the m_currentRoutineType was RoutineType::FPReg, then again, the else would have no effect.
| #ifdef TARGET_ARM | ||
| if (argLocDesc.m_cGenReg == 2 || argLocDesc.m_byteStackSize >= 8) | ||
| { | ||
| /* do nothing */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please help me understand why arguments with 2 registers or 8 or more bytes passed on stack don't get any routine classification here?
The goal of this part of the code together with the TerminateCurrentRoutineIfNotOfNewType call is to identify when we cannot extend the range of registers / stack that we copy using a single routine. So with this change, any time we get an argument that is passed in two regs or an argument that is passed on stack with 8 or more bytes, we would not extend the previous range of registers / stack and I am not sure why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But single routine handles only a stack or registers. For the case you have described, there would need to be two routines. In the current method, this added ifdef seems to have no effect. If the m_currentRoutineType was RoutineType::GPReg, then it would not enter the if below with or without the else. And if the m_currentRoutineType was RoutineType::FPReg, then again, the else would have no effect.
I am so sorry. When I have implemented it in the first time, removing else statements was valid . After the TerminateCurrentRoutineIfNotOfNewType function is introduced, I didn't read codes carefully.I will check GetArgLocDescForStructInRegs which you mentioned in another comment.
Can you please help me understand why arguments with 2 registers or 8 or more bytes passed on stack don't get any routine classification here?
The goal of this part of the code together with the TerminateCurrentRoutineIfNotOfNewType call is to identify when we cannot extend the range of registers / stack that we copy using a single routine. So with this change, any time we get an argument that is passed in two regs or an argument that is passed on stack with 8 or more bytes, we would not extend the previous range of registers / stack and I am not sure why.
I intended to block to extend of the previous ranges for long bytes and to enable to extend 4-size single register / stack. Because for 4-byte arg and 8-byte arg are handled differently. For 4 byte, it load a single interp stack to a register or stack. For 8 bytes, it load a single interp stack to two registers or stacks. If 4-byte and 8-byte args are mixed in a range, I think it is hard to identify how to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the def in TerminateCurrentRoutineIfNotOfNewType. And for arg which has both reg and stack, execute ProcessArgument for each. Could you review again?
src/coreclr/vm/callstubgenerator.cpp
Outdated
| } | ||
| else if ((m_currentRoutineType == RoutineType::FPReg) && (type != RoutineType::FPReg)) | ||
| #ifndef TARGET_ARM | ||
| else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the case when a struct is split between registers and stack also exists on x64 Linux. The ArgIterator has the GetArgLocDescForStructInRegs() for this purpose. The code in the ComputeCallStub uses that to pretend the parts of the struct are "separate arguments" from the further processing point of view. The stuff around it is quite complex due to the complicated classification mechanism defined by the Unix amd64 ABI, but it seems we could handle the case of args split between regs and stack at that place too and I think we would not have to handle that here then.
| #ifdef TARGET_ARM | ||
| if (argLocDesc.m_cGenReg == 2) | ||
| { | ||
| pRoutines[m_routineIndex++] = GetRegRoutine_4B(argLocDesc.m_idxGenReg, argLocDesc.m_idxGenReg + argLocDesc.m_cGenReg - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered / tried changing the interpreter stack so that on arm32, the slot size would be 32 bit? It seems it would then eliminate the need for these special _4B variants. However, I am not sure if there are any blockers / issues with changing the stack slot size. I've asked in a private discussion why Mono doesn't use 32 bit interpreter stack on arm32 and the answer was that there was no reason for trying to do that there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered / tried changing the interpreter stack so that on arm32, the slot size would be 32 bit? It seems it would then eliminate the need for these special _4B variants. However, I am not sure if there are any blockers / issues with changing the stack slot size. I've asked in a private discussion why Mono doesn't use 32 bit interpreter stack on arm32 and the answer was that there was no reason for trying to do that there.
At first, I implemented with 32 bit slot size and the tests passed because I thought 32 bit slot size is better in 32 bit arch. After #120688 (comment), I changed it to 64 bit. I am not sure which is better for .NET runtime interpreter. Could you please let me know your idea?
Thank you.
- To support structure arg which has reg and stack, split it for each
Enable interpreter for arm32 softfp.