-
Notifications
You must be signed in to change notification settings - Fork 657
Set parent pointers as nodes are ready in parser, rather than all at once as parse is complete #1279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…once as parse is complete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR changes parent-pointer assignment to occur immediately as nodes are parsed or reparsed, improving context lookup for JSDoc transforms (especially @satisfies
). Key updates include:
- Introducing
finishReparsedNode
and acurrentParent
mechanism to incrementally set.Parent
viasetParent
. - Extensive calls to
finishReparsedNode
acrossreparser.go
and updates tofinishNodeWithEnd
inparser.go
to reset parent pointers on the fly. - Updated test baselines for JSDoc
@satisfies
error output to reflect the new transformer behavior.
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
internal/parser/reparser.go | Adds finishReparsedNode and calls it after reparsing each JSDoc node to set parent pointers immediately. |
internal/parser/parser.go | Introduces currentParent , updates finishNodeWithEnd , and adds unparseExpressionWithTypeArguments to reset parent pointers post-parse. |
internal/checker/checker.go | Expands export-assignment check to include KindJSExportAssignment . |
testdata/baselines/reference/... | Updated expected error messages for various checkJsdocSatisfiesTag tests to match new behavior. |
This is an improvement I was hoping we'd "eventually" send, so I'm happy to see that "eventually" could be "right now" 😄 The JSX case was the one case I knew we did something funky with. Hopefully there aren't any more? I know @andrewbranch mentioned that we may not know the right parents for the reparsed nodes early enough. Is there a way we could in testing verify that the parents are set properly, like potentially in the harness's |
Probably. I think we used to have tests that did this in Strada to check that |
- ~ | ||
-!!! error TS2353: Object literal may only specify known properties, and 'b' does not exist in type 'T1'. | ||
- const t3 = /** @satisfies {T1} */ ({}); | ||
+/a.js(22,38): error TS2741: Property 'a' is missing in type '{}' but required in type 'T1'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this is actually a desired change IIRC, some new error stack collapsing
-!!! error TS2353: Object literal may only specify known properties, and 'x' does not exist in type 'Partial<Record<Keys, unknown>>'. | ||
+!!! error TS2353: Object literal may only specify known properties, and 'x' does not exist in type 'Partial<Record<"a" | "b" | "c" | "d", unknown>>'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, where did a b c d
come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expansion of Keys
- presumably a difference in type node printback behavior to do with not currently having most input node reuse logic in place right now.
FWIW I could have sworn the intention was that any reparsed nodes were copies and their parents set in a "normal" way, so I'd be interested in fixing that stuff too at some point. The lest special cases the better... |
Yeah.... the |
I think we could assert non- |
If this is still happening, I'm pretty sure it's a bug. I know I had found one place in the reparser that was missing a clone, but perhaps there are more. |
When I test main vs this branch+main, the perf is a lot worse:
This seems to be due to the closure thing again: |
sigh always the method closure overhead. I'm willing to bet if I write an actual small inline closure instead of a method reference it'll actually get inlined. In other news, though, the jsdoc reparser reparses /** @type {string} */
module.exports = 0; into a JS export declaration with a The jsdoc reparser only gets away with the mutations it does normally because the |
internal/parser/reparser.go
Outdated
@@ -6,8 +6,10 @@ import ( | |||
) | |||
|
|||
func (p *Parser) finishReparsedNode(node *ast.Node) { | |||
p.currentParent = node | |||
node.ForEachChild(p.setParent) | |||
node.ForEachChild(func(n *ast.Node) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ForEachChild is an indirect call, so is not inlinable and escapes its argument; this is why I had to do the goofy parent setting thing in SetParentsInChildren
. I would suspect that this wouldn't work.
You sure someone didn't walk this back? The text "Clone" doesn't even appear inside the reparser. |
I found it, but it was never committed as we just undid the parent code in the binder instead. #1187 (comment) is actually one such case. |
That’s what I said too, but apparently that’s the design that @ahejlsberg and @sandersn discussed. The invariant that was intended to hold is weaker: that real nodes have real parents. So a real node is allowed to appear inside a reparsed node, but its parent should point into the real AST. In other words, if you visit the children of a reparsed subtree, you may not be able to get back up to the same place by parent pointer traversal. I don’t particularly like this, but that was the intended design, and many things broke when I tried adding consistent cloning. |
That's true, but reparsed nodes are, for all intents, real nodes. Like an
Same here where swapping parents between multiple "correct" options causes changes! And looking through those breaks, they're places where "alternate but correct" parent types don't seem to be handled, like |
To me, it sounds like we need to revisit the no-clone real-parent design again in a realtime discussion. I can't make the design meeting today but maybe on Friday. The current design really doesn't turn the tree into a graph, though. It adds out-of-band secondary parents that are only checked in a few places. (Which are then treated inconsistently on commonjs exports to avoid adding a secondary parent field to BinaryExpression, as you noticed.) What algorithms are broken by it, if they skip nodes flagged as reparsed? |
It does that, too, but it also, in a roundabout way, sets the |
…er invariant maufactured by the js reparser
…ng cause these diffs - they seem like improvements, at least
I think we're likely to merge #1241 soon, so hopefully that isn't too hard to merge into this PR. |
@jakebailey I had to except the cases where the reparser reuses nodes, but otherwise the tests show the assertion holding now. There's some baseline diffs as in some of those cases, a different parent is selected now; but the baselines show mostly improvements (or at least reduced diffs) from the current post-hoc parent selection method. I'm pretty disappointed on the manual indirections needed to iterate node children and set parent pointers efficiently though. :( |
Here's the current perf diff. Memory is unchanged, but runtime gets slower:
The profile is a little unclear, but it does imply that 10% of the time is spent in OverrideParentInImmediateChildren, whereas the old recursive setting was only 5% of the parse time. Half of that is just messing with the pool; my guess is that while the old code would only get something out of the pool once per AST, now we're doing it for every node on finish. |
Probably this just means we should be sticking this on the parser as a cached function pointer instead of a global? |
With the suggestion above, versus main:
|
Main without parent setting in the parser, vs main vs this PR:
Not getting back as much as I thought... |
However, I think the numbers do correspond to the time removed from the binder in #1251, so that may all be expected. |
Yeah, as I mentioned offline, I think the only way we could get more efficient than this would be if we built setting parent pointers into the node factory functions to avoid the virtual dispatch on In any case, this gets us back to where we were-but-racing but without-a-race, so seems like a good place to stop for now. |
+!!! related TS2728 /a.js:3:23: 'a' is declared here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR but seems like we're not setting the range of the satisfies expression properly? That or we need to do some special handling to choose the JSDoc node to report the satisfies error on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The location on the synthetic satisfies
expression is set to the satisfies
tag, but binder.GetErrorRangeForNode
isn't set up to handle reparsed nodes for this node kind, so this error span is accidentally close, at best. It's "the token range at the end of the trivia after the expression position", which for the @satisfies
tag is the semicolon trivia we see underlined here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Baseline improvements all seem good too.
+ ~ | ||
+!!! error TS2304: Cannot find name 'T'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to fix this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entirely accidental upside to going through the reparser and adding calls to manually set parents. Node just gets a .Parent
set to a node in the real expression tree now, where it did not before (it got a jsdoc parent).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the baselines are an unqualified improvement, but it's most likely that the bad diffs I'm seeing are from problems that were hidden before when the parents were wrong.
@@ -30197,7 +30197,7 @@ func (c *Checker) getSymbolOfNameOrPropertyAccessExpression(name *ast.Node) *ast | |||
return c.getSymbolOfNode(name.Parent) | |||
} | |||
|
|||
if name.Parent.Kind == ast.KindExportAssignment && ast.IsEntityNameExpression(name) { | |||
if (name.Parent.Kind == ast.KindExportAssignment || name.Parent.Kind == ast.KindJSExportAssignment) && ast.IsEntityNameExpression(name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is code ported since the introduction of JSExportAssignment, I think that means I'll need to make another pass through the source looking for places where all the synthetic kinds need to be handled alongside their non-synthetic peers.
@@ -977,6 +978,9 @@ func (p *Parser) parseTypedefTag(start int, tagName *ast.IdentifierNode, indent | |||
|
|||
typedefTag := p.factory.NewJSDocTypedefTag(tagName, typeExpression, fullName, comment) | |||
p.finishNodeWithEnd(typedefTag, start, end) | |||
if typeExpression != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nested @typedef
pushes the whole syntax from tragedy to loop back around to comedy.
@@ -75,7 +75,7 @@ | |||
+>module.exports : { justExport: number; bothBefore: number; bothAfter: number; } | |||
+>module : { "export=": { justExport: number; bothBefore: number; bothAfter: number; }; } | |||
+>exports : { justExport: number; bothBefore: number; bothAfter: number; } | |||
+>bothBefore : number | |||
+>bothBefore : "string" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was number | "string"
before, and switched types, so I think that's an effect of our stricter rules combined with different visit order. Or something like that?
+>exports : typeof import("./mod1") | ||
>"b" : "b" | ||
+>"b" : { x: string; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a change in how the baseliner is interpreting "b"
, from a string literal to a property name. The latter is defensible but definitely different from before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for sure a difference in the symbol and therefore type associated with the node.
Turning off auto merge for now, given we were wanting to talk about this tomorrow and any approvals here will make it go in (it would have been merged just now if there weren't a merge conflict) |
thanks, didn't notice that |
That being said, I'm pro-this-change because yay perf, just wanted to make sure we understood the model we're going for with this new reparsed AST. |
Haven't changed anything here about that, this just tosses exceptions into the test you asked for in the cases that highlight the issues with the current model - we do need to talk about it, but that's followup work. I think we can take this as-is (sans merge conflict). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which case, seems fine, and I'm glad to have at test to make sure we don't break it.
This is, imo, an improvement on #1251. It's almost very straightforward to just set parents in
finishNode
(which would be ideal as it's when all the relevant pointers are in cache)... except for all the cases where we reparse a node and need to forcibly overwrite parent pointers we already set (which can't be the default behavior, as last-in-wins parent setting by default breaks the jsdoc reparser which currently only ends up setting parents on new nodes, and not copied or moved ones).This also exposed an interesting issue with the
@satisfies
jsdoc transform and ended up reducing the diffs for it - by overriding the.Parent
explicitly on the transformed result, as is needed to ensure one is always set in this manner (in currentmain
, the parent of the syntheticsatisfies
expression/type only gets set if it hasn't been set by the main file walk, which is to say basically never), we get correct context lookup behavior in the checker.