lx: Distinguish between unexpected EOF and EOF in ignored zones (broken by #509) #510

silentbicycle · 2025-08-26T19:08:40Z

PR #509 introduced a bug: It didn't distinguish between an unexpected end of input and an end of input in a zone that matches but ignores its input. This caused several lxpos tests to fail due to getting a TOK_UNKNOWN rather than a TOK_EOF when the input has trailing whitespace, but I didn't notice until after merging because the normal build doesn't regenerate the code for src/lx/lexer.lx or src/libfsm/lexer.lx. (I had ensured all the libre dialect lexers and parsers were regenerated, but missed those.)

Only src/lx/print/c.c has code changes, the other files are all generated code updates.

Instead of always generating TOK_UNKNOWN, this this inspects the zone mappings to determine whether the current end ID represents a dead end for the zone. If not, it should instead generate TOK_EOF.

PR #509 introduced a bug: It didn't distinguish between an unexpected end of input and an end of input in a zone that matches but ignores its input. This caused several lxpos tests to fail due to getting a TOK_UNKNOWN rather than a TOK_EOF when the input has trailing whitespace, but I didn't notice until after merging because the normal build doesn't regenerate the code for src/lx/lexer.lx or src/libfsm/lexer.lx. (I had ensured all the libre dialect lexers and parsers were regenerated, but missed those.) Instead of always printing TOK_UNKNOWN, this this inspects the zone mappings to determine whether the current end ID represents a dead end for the zone. If not, it should instead print TOK_EOF.

silentbicycle · 2025-08-26T19:14:52Z

src/lx/print/c.c

+	 * should stay small enough that linear search is fine. If this becomes
+	 * prohibitively expensive, then build a bitset of dead-end IDs upfront
+	 * in one pass. */
+	for (struct ast_zone *z = ast->zl; z != NULL; z = z->next) {


I set up a variety of scenarios and they all behaved consistently this being how lx identifies dead ends internally, but please let me know if I'm misunderstanding something or there's a more direct way to check this. I didn't see a way to tell what zone the accept_c callback is running inside of, but the linear scan across all zones should be small in practice.

silentbicycle added 2 commits August 26, 2025 14:54

Generated code. Re-generate lexers and parsers with lx bug fixed.

9be4aec

silentbicycle requested a review from katef August 26, 2025 19:08

silentbicycle commented Aug 26, 2025

View reviewed changes

katef approved these changes Aug 29, 2025

View reviewed changes

katef merged commit 6c66234 into main Aug 29, 2025
346 checks passed

katef deleted the sv/fix-lx-handling-for-EOF-broken-by-509 branch August 29, 2025 00:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

lx: Distinguish between unexpected EOF and EOF in ignored zones (broken by #509) #510

lx: Distinguish between unexpected EOF and EOF in ignored zones (broken by #509) #510

Uh oh!

silentbicycle commented Aug 26, 2025 •

edited

Loading

Uh oh!

silentbicycle Aug 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lx: Distinguish between unexpected EOF and EOF in ignored zones (broken by #509) #510

lx: Distinguish between unexpected EOF and EOF in ignored zones (broken by #509) #510

Uh oh!

Conversation

silentbicycle commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

silentbicycle Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

silentbicycle commented Aug 26, 2025 •

edited

Loading