Skip to content

Conversation

@dagar
Copy link
Member

@dagar dagar commented Sep 8, 2025

No description provided.

Copy link
Contributor

@dakejahl dakejahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely an improvmement. The handleInjectDataTopic function can still block for a while though. If it loops the max (8) times and injects 300 bytes each time at 115200 baud that results in ~200ms blocking. At 921600 it's only 26ms and is probably goes unnoticed.

@jeremyzff
Copy link
Contributor

Not sure this fixes the situation as much as necessary- As jake pointed out, it would definitely make sure that on entry, IF GPS bytes are available it will service them first, but I think the parsing happens after this call, so we'd still get blocked if the inject takes a long time. I think the fundamental problem is that in the duration of time it takes to read in 8 rtcm messages, the 600 byte hardware buffer (v5x) could have overflowed several times, so next loop even with data available in the buffer, its not the continuation of the last time we read it because the receive buffer overflowed.

@jeremyzff
Copy link
Contributor

I think the correct fix to this problem is to fully separate the read and write threads so that the read task will NEVER block, and the write task will write as much data as possible.

@dakejahl
Copy link
Contributor

dakejahl commented Sep 9, 2025

Maybe we can break off the RTCM injection into a SubscriptionCallback work item. Is the driver doing any other writes aside from RTCM injection during normal run time?

@dakejahl dakejahl requested a review from alexcekay September 9, 2025 19:27
@alexcekay
Copy link
Member

alexcekay commented Sep 10, 2025

@dakejahl,
the driver is only writing during config or when being told to reset the device.

If we really do not want to be blocked by the write() anymore, we either have to change the write in SerialImpl to be truly non-blocking, so no fsync (and handle the EAGAIN and non complete write correctly), or spin up two different tasks, which is kind of heavy (regarding performance overhead and additional flash) for just having a non blocking write.

@dakejahl
Copy link
Contributor

Another thread is probably the easiest way to accomplish the job but not strictly necessary. The other option is to interleave reads/writes and only write small chunks at a time. We also would need to buffer the RTCM uORB data so that we can chunk it out at our leisure.

@jeremyzff
Copy link
Contributor

I'm not 100% sure how the EKF handles the GPS timing, but I believe there is a parameter that adjusts the expected delay of the GPS data vs the EKF time horizon. If we are varying the timing baesd on injection and blocking reads while we do, the GPS data may come in at variable timing which is probably subpar for the EKF.

@jeremyzff
Copy link
Contributor

It seems silly that we have a full duplex serial port but can't actually use it that way, So I'd support something that properly parallelizes the process. Perhaps the GPS reading thread can operate as normal, and the writing can happen on a work queue (so it isn't a full extra thread?)

@dakejahl
Copy link
Contributor

The EKF time horizon is ~100ms so GPS read jitter shouldn't have an impact as long as we stay well below that. A work queue is still a full thread but we might be able to get away with putting it on the hp_default, but I would probably be safe and just give it it's own thread. @dagar what do you think? Also, who wants to fix this? I'm pretty busy at the moment but am interested in seeing this solved quickly. @jeremyzff want to take a stab at it?

@jeremyzff
Copy link
Contributor

the work queue is a full thread that is more or less already "paid for" right? so adding it to a work queue adds minimal overhead was my thought. But a separate thread might be ok too. I'm sensitive to the RAM usage because our drones are starting to get pretty full...

@jeremyzff
Copy link
Contributor

regarding read jitter- a burst of 8 messages of full 180 bytes at 115200 baud GPS rate is 125ms, which means that is how much variation I might have in when I read the GPS data between when there is no injection or max injection... Am I thinking about this right? so if it already takes 100ms, it could sometimes take 100ms, and other times take 225ms?

@dakejahl
Copy link
Contributor

A work queue is a thread that can be shared by multiple work items. But these work items run sequentially in the thread context. So all items in the queue are blocked by the slowest running item they share it with.

If we don't use another thread and instead interleave non-blocking read/writes there won't really be any jitter since the reads/write are non-blocking. We just need to make sure we're checking the return value of read/write to ensure we don't quietly drop data. The output data (RTCM) needs to be buffered so that if we try to write 200 bytes but instead only write 50, we still have those 150 waiting to be written.

@jeremyzff
Copy link
Contributor

we've tested this, and its part of the overall solution for us. By itself, this didn't fix our problem because the injection still took same amount of time and the RX buffer would overflow. But with the reversion of the fsync->tcdrain patch also which made writes nonblocking again, this still makes sure that we process a lot of the GPS buffer if its got backed up while we were away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants