-
Notifications
You must be signed in to change notification settings - Fork 0
Description
I'd like suggest incorporating at least some knowledge about hsc2hs pragmas into treesitter grammar for Haskell.
While complete support may require some work, I would be glad to see the grammar not erroring out on some of the simpler and pragmas that constitute 80%+ of typical hsc2hs use, for example:
// With C header having something like
typedef struct foo {
...
chilt_t child;
...
} bar_t;test1 ptr foo =
#{poke bar_t, child} ptr foo
test2 ptr foo =
(#poke bar_t, child) ptr foo
test3 =
#poke struct foo, childAll of the above are equivalent from hsc2hs tool's perspective. All define a function of two arguments that will write a value into C struct named struct_t through its child named child given a pointer to the structure and a new value. In the wild any of the three forms may occur.
The reason to supported this construct in the grammar comes from the fact that forms 1 or 3 mess up the pasin state and file occurrence of either of those is not parsed properly. If not for this fact, it would have been OK to not do anything and parse pragmas into some soup of tokens since very few users are likely to want to query for pragma AST nodes specifically (none wanted so far as far as it seems).
It would be very useful to at least support all 3 variants of following pragmas, hopefully their grammar would not prove too bad:
| Pragma | Example |
|---|---|
| #type ⟨C_type⟩ | #type int32_t |
| #peek ⟨struct_type⟩, ⟨field⟩ | (#peek struct_t, child) |
| #poke ⟨struct_type⟩, ⟨field⟩ | #{poke struct foo, bar} |
| #ptr ⟨struct_type⟩, ⟨field⟩ | #ptr struct bar, baz |
| #offset ⟨struct_type⟩, ⟨field⟩ | #offset bar_t, baz |
| #size ⟨struct_type⟩ | #{size foo} |
| #alignment ⟨struct_type⟩ | (#alignment bar) |
| #const ⟨C_expression⟩ | #const FOO + 1 |
| #const_str ⟨C_expression⟩ | #{const_str global_constants[FOO]} |
NB Both C_type and struct_type are any valid type in C from the C compiler's perspective, so either single word type identifier like struct_t as well as a multi-word type denoting struct referred to via its tag struct foo may be used.
Arbitrary C expressions are likely to be tricky. But maybe fully parsing them may is not necessary - if the grammar can just drop everything until closing delimiter, for example supporting only form 1 and 2 above, that would be very useful.
For reference, the full syntax of pragmas is documented here: https://github.com/haskell/hsc2hs/tree/2059c961fc28bbfd0cafdbef96d5d21f1d911b53?tab=readme-ov-file#input-syntax