Add grammar for hsc2hs dialect #20

sergv · 2025-12-28T13:37:13Z

As discussed in #17.

The #type, #peek, #poke, #ptr, #offset, #size, and #alignment directives are implemented more or less completely as far as I can tell. The #enum is omitted since it's used very rarely - I failed to find any uses. It could be added later relatively easily judging from its grammar description.

The only thing that could is implemented partially and requires some discussion is the #const (and its identical #counst_str cousin). The problem is that the #const directive takes any valid C expression.

I supported only C expressions that refer to a name (e.g. (#const TCP_CONNECTIONTIMEOUT) ), this already handles most of the uses of #const but there are cases in the wild that cannot be parsed currently, e.g.:

(#const sizeof(struct linger))
(#const offsetof(struct {char x__; struct timeval (y__); }, y__))
#const sizeof(struct addrinfo)
#const sizeof(((struct sockaddr_un *)NULL)->sun_path)
#{const __builtin_offsetof (KEY_EVENT_RECORD, wRepeatCount)}
(#{const GENERIC_WRITE | GENERIC_READ})
(#const CONTEXT_FULL|CONTEXT_DEBUG_REGISTERS|CONTEXT_FLOATING_POINT)
(#const SDL_WINDOWPOS_CENTERED_DISPLAY(15))

It would be good to support these, but it's not clear what's the best way to do that. One way would be to re-implement C grammar, but it's pretty large and relatively complex for feature o f this magnitude so likely that's not a good solution.

Maybe there's a way to re-use existing C grammar - add it as a submodule and re-use some of its rules?

Another possibility would be to support arbitrary combinations of identifiers combined with binary operators, function calls like f(...), and nested brackets? We don't really care about the syntax tree of the C expression, just the fact that the hsc grammar accepts anything that looks like #{const ...} provided nesting of parens, brackets and braces within ... is OK. Do you think this is feasible?

tek

Awesome! I don't have any requirements for the grammar, so aside from the pragma stuff in the scanner, all my comments are just recommendations.

Maybe there's a way to re-use existing C grammar - add it as a submodule and re-use some of its rules?

The usual approach for this is to use injection queries, but for this case it probably won't work – injections don't allow you to specify which rule to use as the root, so they would only succeed if the directive was followed by a top level declaration.

It's definitely possible to import the C grammar as a node package though! We're already doing that with the Haskell grammar, but I have no clue how difficult that would be to integrate into non-npm package ecosystems, like nvim-treesitter or nixpkgs. You're welcome to experiment with that, would be interesting!

One more high-level suggestion: Since HSC appears to reserve the # operator exclusively, I would recommend changing the scanner and grammar so that it doesn't return LHash in HSC mode. I'd expect that to reduce friction quite a bit.

src/scanner.c

tek · 2025-12-29T11:28:19Z

src/scanner.c


 #define PEEK env->lexer->lookahead

+#define ARR_SIZE(x) ((sizeof(x) / sizeof(0[x])) / ((size_t)(!(sizeof(x) % sizeof(0[x])))))


What's the purpose of this? Are you ensuring that the array is nonempty?

What's the deal with writing 0[x] instead of x[0]?

This is the best version I found on the internet, taken from https://stackoverflow.com/questions/4415524/common-array-length-macro-for-c.

I'd usually define macro as sizeof(x) / sizeof(x[0]) but advantages of writing it as in PR are:

It improves on the array[0] or *array version by using 0[array], which is equivalent to array[0] on plain arrays, but will fail to compile if array happens to be a C++ type that overloads operator[](). The division causes a divide-by-zero operation (that should be caught at compile time since it's a compile-time constant expression) for many (but not all) situations where a pointer is passed as the array parameter.

tek · 2025-12-29T11:58:12Z

hsc/grammar.js

-  precedences: ($, previous) => previous,
+    // Hook up hsc directives into Haskell grammar depending
+    // on their return type.
+    _stringly: ($, previous) => choice(


A general guideline for TS grammars is that we don't wan't to validate or typecheck the language, but rather recognize it leniently.

Here it seems like you're providing specialized nodes based on what type they have.

I understand that the arguments of the directive differ depending on the keyword, so that may be a justification for this approach.
If you care about that, I have no objection, but if you only want to avoid parse errors in HSC files, it may be easier to just parse (in the scanner) the basic structure mentioned in the docs 🙂:

They extend up to the nearest unmatched ), ] or }, or to the end of line if it occurs outside any () [] {} '' "" /**/ and is not preceded by a backslash.

It would be good to at least know name of a directive, but that's not strictly necessary for me. I'll try to check how it could be done in the scanner.

Going via scanner looks like a pretty good idea because it will allow supporting #{const} nodes than contain arbitrary C expressions without having to parse them.

It would be very useful if you can point me to the place in the scanner where it should go.

tek · 2025-12-29T12:14:59Z

hsc/grammar.js

+  rules: {

-  supertypes: ($, previous) => previous,
+    // Collect all possible hsc directives under this node but don’t use it since


What's the purpose of this?

This is so that different directives will alias under the same hsc node for easier patternmatching by users.

tek and others added 3 commits December 22, 2025 17:53

Implement the hsc grammar

9036e0b

Working hsc2hs grammar that handles most constructs

2e09a77

Regenerate hsc grammar.json and node-types.json

c9b4511

tek reviewed Dec 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add grammar for hsc2hs dialect #20

Add grammar for hsc2hs dialect #20

Uh oh!

sergv commented Dec 28, 2025

Uh oh!

tek left a comment

Uh oh!

Uh oh!

tek Dec 29, 2025

Uh oh!

sergv Dec 30, 2025

Uh oh!

tek Dec 29, 2025 •

edited

Loading

Uh oh!

sergv Dec 30, 2025

Uh oh!

tek Dec 29, 2025

Uh oh!

sergv Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		#define PEEK env->lexer->lookahead

		#define ARR_SIZE(x) ((sizeof(x) / sizeof(0[x])) / ((size_t)(!(sizeof(x) % sizeof(0[x])))))

Add grammar for hsc2hs dialect #20

Are you sure you want to change the base?

Add grammar for hsc2hs dialect #20

Uh oh!

Conversation

sergv commented Dec 28, 2025

Uh oh!

tek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tek Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

sergv Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

tek Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sergv Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

tek Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

sergv Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tek Dec 29, 2025 •

edited

Loading