Skip to content

Conversation

@dakejahl
Copy link
Contributor

@dakejahl dakejahl commented Sep 9, 2025

Reverts #23686

Due to the bug described here
#23686 (comment)

FYI this is a breaking change for a bunch of things, including bursts of RTCM traffic. We had a problem where a burst of RTCM data was taking down the GPS. This change made the GPS serial port effectively half-duplex, and was blocking the READ part of the code. If I reverted this change, I get a /dev/ttyS0 write error -1 on the write thread and the read loop continues to process correctly.

Solutions
GPS driver RTCM injection logic could be better
#25535

We need a blocking and non-blocking write method
#25537

But this should be reverted since the problem it solves is the minority case. The fix can be handled by instead adding a block write method OR by first polling (POLLOUT) to check if data can be written before calling write.

@dakejahl dakejahl requested review from dagar and katzfey September 9, 2025 06:30
@alexcekay
Copy link
Member

Hi @dakejahl,

I allowed myself to add a small update to avoid the console getting spammed with errors no in case of fast uart writes (which can now cause EAGAIN to be returned, as there is no wait to drain anymore). This leads to a lot of spam.

@dakejahl
Copy link
Contributor Author

dakejahl commented Sep 9, 2025

@alexcekay I'm okay with removing the warning but we should also fix the caller that is causing the warning in the first place. If 0 bytes are written it means the caller needs to back off and wait until the TX buffer has partially drained.

@dakejahl dakejahl requested a review from alexcekay September 9, 2025 19:17
@alexcekay alexcekay force-pushed the pr-serial_drain_revert branch from e60fe5a to 799ef42 Compare September 10, 2025 16:47
@dakejahl dakejahl merged commit 1aad8b6 into main Sep 11, 2025
68 of 71 checks passed
@dakejahl dakejahl deleted the pr-serial_drain_revert branch September 11, 2025 20:32
@alexcekay
Copy link
Member

alexcekay commented Sep 12, 2025

Just FYI @dakejahl,

I had a look at all the drivers that actually use the write and may now have problems as it is more likely now that the number of returned bytes does not match the requested number of bytes (will of course only likely occur when there is high load on the UART). Note that it already was possible before this change that the requested amount of bytes does not match the returned amount of bytes (when requesting more than the TX buffer, or interrupted).

@JoelJ18: The following function is a bit optimistic (in MicroStrain.cpp):

bool mipInterfaceUserSendToDevice(mip_interface *device, const uint8_t *data, size_t length)
{
	int res = device_uart.write(const_cast<uint8_t *>(data), length);

	if (res >= 0) {
		return true;
	}

	PX4_ERR("MicroStrain driver failed to write(%d): %s", errno, strerror(errno));
	return false;
}

The interface is happy as long as any data is written, but it may be less than the requested amount of bytes. The rest of the code seems to assume that if this returns true the send worked correctly.

@katzfey: The voxl2_io.cpp init is very optimistic:

int Voxl2IO::get_version_info()
{
	[...]
	while ((got_response == false) && (retries_left > 0)) {
		retries_left--;

		//send the version request command to the board
		if (_uart_port.write(cmd.buf, cmd.len) != cmd.len) { // <- optimistic here
			PX4_ERR("VOXL2_IO: Could not write version request packet to UART port");
			return -1;
		}

In case this one write returns less than the requested amount of bytes (which could happen, although unlikely) the function returns -1 which will cause the whole init() to fail. This causes the driver to not start at all. I think it was intended to actually have multiple tries here (retries_left), but in case the write does not directly write the complete package it instantly returns -1 and does not do any retry.

@dakejahl
Copy link
Contributor Author

We probably need to backport this into 1.16

dakejahl added a commit that referenced this pull request Sep 27, 2025
* serial: nuttx: revert tcdrain back to fsync

* serial: do not print error on EAGAIN

---------

Co-authored-by: Alexander Lerach <[email protected]>
katzfey pushed a commit that referenced this pull request Oct 1, 2025
* serial: nuttx: revert tcdrain back to fsync

* serial: do not print error on EAGAIN

---------

Co-authored-by: Alexander Lerach <[email protected]>
dagar pushed a commit that referenced this pull request Oct 29, 2025
* serial: nuttx: revert tcdrain back to fsync

* serial: do not print error on EAGAIN

---------

Co-authored-by: Alexander Lerach <[email protected]>
mrpollo pushed a commit that referenced this pull request Nov 24, 2025
* serial: nuttx: revert tcdrain back to fsync

* serial: do not print error on EAGAIN

---------

Co-authored-by: Alexander Lerach <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants