-
Notifications
You must be signed in to change notification settings - Fork 701
[runtime] Remove all tensor copies and almost all allocs from the executor #2821
Conversation
|
By my measure this cuts a few seconds off the unittest run time as well, hooray |
|
Nice! |
|
This is great, thanks! |
jackm321
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment about lack of ownership of tensors here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean here? The Placeholders are owned by module_. (maybe just write the comment you like here and i'll copy it in 😄)
|
found another unused propagate function to remove |
|
change |
|
@nickgg Congrats on landing this PR. This is great! |
Description: The ThreadPoolExecutor is doing a lot of copies, which makes it pretty slow - when running realistic size traffic through the runtime we spent more than 3ms in the executor preparing Tensors. Fortunately the executor doesn't need to do any copies at all - since we serialize any nodes that would share an input/output pair, so I've cut them all out. Additionally, we were allocating new Tensor buffers for all sub-graphs which we don't need either.
This has a significant impact on Runtime performance - about a 15x reduction in introduced latency in the test I ran (scales with input size though, obviously):
before:


after:
Testing: all tests under debug, release and ASAN/UBSAN.
Documentation: fixes #2773 and #2776 by removing affected code (I think).