-
Notifications
You must be signed in to change notification settings - Fork 19.9k
AP_Common: add Cortex-M-specific working setjmp implementation #31740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
f5c98f0 to
babe58c
Compare
|
@tpwrules very nice! |
|
That’s exactly what I did, see the second to last paragraph. I also custom wrote my own which was just |
|
I am unable to reproduce the "scripting dies" failure from the first PR. The first PR correctly identified the issue, but messed up the fix, corrupting the stack along with the FP registers on a Lua error, causing a high chance of a hard fault. The second PR then fixed the fix, but left other setjmps broken (which is what this PR aims to fix for good). However, with some instrumentation it is possible to reproduce the original failure after reverting the two original PRs but before applying this one. Then apply this diff and flash your board (I used CubeOrange, no guarantees others will optimize the same): diff --git a/libraries/AP_Scripting/AP_Scripting.cpp b/libraries/AP_Scripting/AP_Scripting.cpp
index 4a3ff370be..5d9c365505 100644
--- a/libraries/AP_Scripting/AP_Scripting.cpp
+++ b/libraries/AP_Scripting/AP_Scripting.cpp
@@ -295,6 +295,10 @@ MAV_RESULT AP_Scripting::handle_command_int_packet(const mavlink_command_int_t &
#endif
void AP_Scripting::thread(void) {
+ // s16 is callee-saved, gcc expects other functions to save it if it is used
+ register float canary asm("s17");
+ asm("vmov.f32 s17, #0.5"); // load here so gcc doesn't know the value
+
while (true) {
// reset flags
_stop = false;
@@ -319,7 +323,7 @@ void AP_Scripting::thread(void) {
lua->run();
// only reachable if the lua backend has died for any reason
- GCS_SEND_TEXT(MAV_SEVERITY_CRITICAL, "Scripting: %s", "stopped");
+ GCS_SEND_TEXT(MAV_SEVERITY_CRITICAL, "Scripting: %s canary: %f", "stopped", canary);
}
delete lua;
lua = nullptr;Then load (only) this script: print("faulting in 10 seconds")
local goes = 0
function update()
goes = goes + 1
if goes < 10 then return update, 1000 end
math.random(1, 0)
end
return update, 1000A sample mavproxy transcript follows: DetailsIt shows that the first two restarts happen before the fault, so they show a valid canary value of 0.5. Then the fault happens and Once this PR is applied the canary is always 0.5 even after a fault, showing that the FP registers are correctly preserved and addressing the root cause. |
|
Amazing work @tpwrules ! |
Necessary to work around a bug in newlib. This applies to all hardfloat targets, including Cortex-M4. The jmp_buf is now 12 bytes larger, totaling 104 bytes. Info from newlib commit 15ad816dddf836def06cd0330ec0efa9ce50e5bf.
It properly saves and restores the floating point registers.
No longer necessary; verified with a debugger. These were also never applied to Cortex-M4.
Should instead use ap_setjmp.
babe58c to
e56f088
Compare
This is the ultimate fix for all the hacks we had to do in #24231 and #27056 to ensure floating point registers are properly saved and restored when handling Lua errors. Unlike those, this fixes the problem at the source, which is newlib's setjmp, used in ChibiOS on STM32. The problem could theoretically happen on Cortex-M4 too, this fixes it there as well. The problem could also happen at any setjmp, of which we had one other that was not protected.
I've included a fixed setjmp under a new name based on newlib's patch. We can't use linker wrapping to replace the existing implementation as we also need to grow jmp_buf by 10% or so. I did add a linker wrap to ensure setjmp can't be used unaware in some other part of the code. At such time as we banish older compilers with the broken newlib we can remove ap_setjmp.
Tested the fault handler script created in a previous PR on both Cube Orange and Cube Black (non-trivial!). I tested to 100k iterations and three scripting restarts. I also manually set up the FP registers in the main thread function and validated with a debugger and with prints that they were correctly preserved (and did not get preserved when the various patches were reverted).
This is a prerequisite to fix another hardfault-on-restart in scripting in the panic handling code. To be clear, this particular change doesn’t fix any crash I discovered or believe is easily possible, it just makes the code cleaner and safer.