-
Notifications
You must be signed in to change notification settings - Fork 1
Add grammar for hsc2hs dialect #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
tek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I don't have any requirements for the grammar, so aside from the pragma stuff in the scanner, all my comments are just recommendations.
Maybe there's a way to re-use existing C grammar - add it as a submodule and re-use some of its rules?
The usual approach for this is to use injection queries, but for this case it probably won't work – injections don't allow you to specify which rule to use as the root, so they would only succeed if the directive was followed by a top level declaration.
It's definitely possible to import the C grammar as a node package though! We're already doing that with the Haskell grammar, but I have no clue how difficult that would be to integrate into non-npm package ecosystems, like nvim-treesitter or nixpkgs. You're welcome to experiment with that, would be interesting!
One more high-level suggestion: Since HSC appears to reserve the # operator exclusively, I would recommend changing the scanner and grammar so that it doesn't return LHash in HSC mode. I'd expect that to reduce friction quite a bit.
|
|
||
| #define PEEK env->lexer->lookahead | ||
|
|
||
| #define ARR_SIZE(x) ((sizeof(x) / sizeof(0[x])) / ((size_t)(!(sizeof(x) % sizeof(0[x]))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this? Are you ensuring that the array is nonempty?
What's the deal with writing 0[x] instead of x[0]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the best version I found on the internet, taken from https://stackoverflow.com/questions/4415524/common-array-length-macro-for-c.
I'd usually define macro as sizeof(x) / sizeof(x[0]) but advantages of writing it as in PR are:
It improves on the array[0] or *array version by using 0[array], which is equivalent to array[0] on plain arrays, but will fail to compile if array happens to be a C++ type that overloads operator[]().
The division causes a divide-by-zero operation (that should be caught at compile time since it's a compile-time constant expression) for many (but not all) situations where a pointer is passed as the array parameter.
| precedences: ($, previous) => previous, | ||
| // Hook up hsc directives into Haskell grammar depending | ||
| // on their return type. | ||
| _stringly: ($, previous) => choice( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general guideline for TS grammars is that we don't wan't to validate or typecheck the language, but rather recognize it leniently.
Here it seems like you're providing specialized nodes based on what type they have.
I understand that the arguments of the directive differ depending on the keyword, so that may be a justification for this approach.
If you care about that, I have no objection, but if you only want to avoid parse errors in HSC files, it may be easier to just parse (in the scanner) the basic structure mentioned in the docs 🙂:
They extend up to the nearest unmatched ), ] or }, or to the end of line if it occurs outside any () [] {} '' "" /**/ and is not preceded by a backslash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to at least know name of a directive, but that's not strictly necessary for me. I'll try to check how it could be done in the scanner.
Going via scanner looks like a pretty good idea because it will allow supporting #{const} nodes than contain arbitrary C expressions without having to parse them.
It would be very useful if you can point me to the place in the scanner where it should go.
| rules: { | ||
|
|
||
| supertypes: ($, previous) => previous, | ||
| // Collect all possible hsc directives under this node but don’t use it since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so that different directives will alias under the same hsc node for easier patternmatching by users.
As discussed in #17.
The
#type,#peek,#poke,#ptr,#offset,#size, and#alignmentdirectives are implemented more or less completely as far as I can tell. The#enumis omitted since it's used very rarely - I failed to find any uses. It could be added later relatively easily judging from its grammar description.The only thing that could is implemented partially and requires some discussion is the
#const(and its identical#counst_strcousin). The problem is that the#constdirective takes any valid C expression.I supported only C expressions that refer to a name (e.g.
(#const TCP_CONNECTIONTIMEOUT)), this already handles most of the uses of#constbut there are cases in the wild that cannot be parsed currently, e.g.:It would be good to support these, but it's not clear what's the best way to do that. One way would be to re-implement C grammar, but it's pretty large and relatively complex for feature o f this magnitude so likely that's not a good solution.
Maybe there's a way to re-use existing C grammar - add it as a submodule and re-use some of its rules?
Another possibility would be to support arbitrary combinations of identifiers combined with binary operators, function calls like
f(...), and nested brackets? We don't really care about the syntax tree of the C expression, just the fact that the hsc grammar accepts anything that looks like#{const ...}provided nesting of parens, brackets and braces within...is OK. Do you think this is feasible?