Skip to content

Conversation

@sergv
Copy link

@sergv sergv commented Dec 28, 2025

As discussed in #17.

The #type, #peek, #poke, #ptr, #offset, #size, and #alignment directives are implemented more or less completely as far as I can tell. The #enum is omitted since it's used very rarely - I failed to find any uses. It could be added later relatively easily judging from its grammar description.

The only thing that could is implemented partially and requires some discussion is the #const (and its identical #counst_str cousin). The problem is that the #const directive takes any valid C expression.

I supported only C expressions that refer to a name (e.g. (#const TCP_CONNECTIONTIMEOUT) ), this already handles most of the uses of #const but there are cases in the wild that cannot be parsed currently, e.g.:

(#const sizeof(struct linger))
(#const offsetof(struct {char x__; struct timeval (y__); }, y__))
#const sizeof(struct addrinfo)
#const sizeof(((struct sockaddr_un *)NULL)->sun_path)
#{const __builtin_offsetof (KEY_EVENT_RECORD, wRepeatCount)}
(#{const GENERIC_WRITE | GENERIC_READ})
(#const CONTEXT_FULL|CONTEXT_DEBUG_REGISTERS|CONTEXT_FLOATING_POINT)
(#const SDL_WINDOWPOS_CENTERED_DISPLAY(15))

It would be good to support these, but it's not clear what's the best way to do that. One way would be to re-implement C grammar, but it's pretty large and relatively complex for feature o f this magnitude so likely that's not a good solution.

Maybe there's a way to re-use existing C grammar - add it as a submodule and re-use some of its rules?

Another possibility would be to support arbitrary combinations of identifiers combined with binary operators, function calls like f(...), and nested brackets? We don't really care about the syntax tree of the C expression, just the fact that the hsc grammar accepts anything that looks like #{const ...} provided nesting of parens, brackets and braces within ... is OK. Do you think this is feasible?

Copy link
Owner

@tek tek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I don't have any requirements for the grammar, so aside from the pragma stuff in the scanner, all my comments are just recommendations.

Maybe there's a way to re-use existing C grammar - add it as a submodule and re-use some of its rules?

The usual approach for this is to use injection queries, but for this case it probably won't work – injections don't allow you to specify which rule to use as the root, so they would only succeed if the directive was followed by a top level declaration.

It's definitely possible to import the C grammar as a node package though! We're already doing that with the Haskell grammar, but I have no clue how difficult that would be to integrate into non-npm package ecosystems, like nvim-treesitter or nixpkgs. You're welcome to experiment with that, would be interesting!

One more high-level suggestion: Since HSC appears to reserve the # operator exclusively, I would recommend changing the scanner and grammar so that it doesn't return LHash in HSC mode. I'd expect that to reduce friction quite a bit.


#define PEEK env->lexer->lookahead

#define ARR_SIZE(x) ((sizeof(x) / sizeof(0[x])) / ((size_t)(!(sizeof(x) % sizeof(0[x])))))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this? Are you ensuring that the array is nonempty?

What's the deal with writing 0[x] instead of x[0]?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the best version I found on the internet, taken from https://stackoverflow.com/questions/4415524/common-array-length-macro-for-c.

I'd usually define macro as sizeof(x) / sizeof(x[0]) but advantages of writing it as in PR are:

It improves on the array[0] or *array version by using 0[array], which is equivalent to array[0] on plain arrays, but will fail to compile if array happens to be a C++ type that overloads operator[]().

The division causes a divide-by-zero operation (that should be caught at compile time since it's a compile-time constant expression) for many (but not all) situations where a pointer is passed as the array parameter.

precedences: ($, previous) => previous,
// Hook up hsc directives into Haskell grammar depending
// on their return type.
_stringly: ($, previous) => choice(
Copy link
Owner

@tek tek Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general guideline for TS grammars is that we don't wan't to validate or typecheck the language, but rather recognize it leniently.

Here it seems like you're providing specialized nodes based on what type they have.

I understand that the arguments of the directive differ depending on the keyword, so that may be a justification for this approach.
If you care about that, I have no objection, but if you only want to avoid parse errors in HSC files, it may be easier to just parse (in the scanner) the basic structure mentioned in the docs 🙂:

They extend up to the nearest unmatched ), ] or }, or to the end of line if it occurs outside any () [] {} '' "" /**/ and is not preceded by a backslash.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to at least know name of a directive, but that's not strictly necessary for me. I'll try to check how it could be done in the scanner.

Going via scanner looks like a pretty good idea because it will allow supporting #{const} nodes than contain arbitrary C expressions without having to parse them.

It would be very useful if you can point me to the place in the scanner where it should go.

rules: {

supertypes: ($, previous) => previous,
// Collect all possible hsc directives under this node but don’t use it since
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that different directives will alias under the same hsc node for easier patternmatching by users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants