Skip to content

Conversation

@peterbarker
Copy link
Contributor

@peterbarker peterbarker commented Jul 28, 2025

Closes #30722

I am not satisfied with this at the moment as I was unable to replicate the problem as described in the issue. @tpwrules helped make this reproduce the problem; the snprintf test infinitely loops before the fix and doesn't afterwards.

The issue states that we can cross this codepath in ftoa_engine via send_text, but my test for print_vprintf in here does not replicate the problem before or after the clamping patch!

This is a belts-and-bracers patch, with a bad/minimal patch inside ftoa_engine to avoid the infinite loop and a better clamping patch inside the sole caller to ftoa_engine.

@tpwrules
Copy link
Contributor

tpwrules commented Jul 28, 2025

This fixes the problem in Lua on CubeOrange (which is funneled to this same routine):

Before:

> ("%.99f"):format(1e-43)
0.00000000000000000000000000000000000000000004974610
> ("%.99f"):format(1e-44)
< no response and freeze >

After:

> ("%.99f"):format(1e-43)                                                       
0.000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000                                                           
> ("%.99f"):format(1e-44)                                                       
0.000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000  

Interesting that ftoa_engine does not correctly add enough trailing zeros in some cases apparently.

The problematic value's hex representation is 0x7, so it's only a few ticks above the minimum positive float.

@tpwrules
Copy link
Contributor

Do your troubles with replication mean this is a ChibiOS-specific problem right now?

@tpwrules
Copy link
Contributor

tpwrules commented Jul 28, 2025

Yeah, I cannot replicate this in SITL and am in fact getting different results for the non-crashing cases. Concerning...

EDIT: possibly because SITL does not have the linker wrap args that catch Lua's use of bare snprintf. Those wraps are super evil!!!

I can verify Peter's test now though.

@tpwrules
Copy link
Contributor

tpwrules commented Jul 28, 2025

Some more notes from investigating:

  • The bug happens when the decimal expansion of the value has six digits or less (assuming precision is >= 7, higher values are clamped), then the loop cannot complete. So there are precisely 26 float values out of 4.2 billion that could hang. (uint32_t v = *(uint32_t*)&f & 0x7FFFFFFF; v >= 0 && v <= 0xe)
  • There is some other bug (seemingly in the assumptions of prod >>= (15-(exp & 7));) where subnormal values are printed as half their true value. So flushing them to zero isn't the worst idea, they were broken anyway. This covers 16.7 million values.

Understanding a little more I think this is a slightly over-complicated fix. I'd be happy with just the flush inside ftoa_engine.

@peterbarker
Copy link
Contributor Author

Understanding a little more I think this is a slightly over-complicated fix. I'd be happy with just the flush inside ftoa_engine.

The output from the ftoa_engine is actually empty with that patch alone! I really didn't try to understand it, just avoid the infinite loop. If you think you can do better, please do feel free to force-push :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Infinite loop in AP_HAL/utility/ftoa_engine when formatting floats with very small exponents

3 participants