Skip to content
This repository was archived by the owner on Jul 1, 2025. It is now read-only.

Conversation

@nickgg
Copy link
Contributor

@nickgg nickgg commented Apr 30, 2019

Description: The ThreadPoolExecutor is doing a lot of copies, which makes it pretty slow - when running realistic size traffic through the runtime we spent more than 3ms in the executor preparing Tensors. Fortunately the executor doesn't need to do any copies at all - since we serialize any nodes that would share an input/output pair, so I've cut them all out. Additionally, we were allocating new Tensor buffers for all sub-graphs which we don't need either.

This has a significant impact on Runtime performance - about a 15x reduction in introduced latency in the test I ran (scales with input size though, obviously):

before:
image
after:
image

Testing: all tests under debug, release and ASAN/UBSAN.
Documentation: fixes #2773 and #2776 by removing affected code (I think).

@nickgg
Copy link
Contributor Author

nickgg commented Apr 30, 2019

By my measure this cuts a few seconds off the unittest run time as well, hooray

@nadavrot
Copy link
Contributor

Nice!

@nrsatish
Copy link
Contributor

This is great, thanks!

@jackm321
Copy link
Contributor

@nickgg this does what #2745 does but in a more complete way, let's close that one. In the longer term, we should discuss pooling placeholder bindings and not releasing this memory between runs.

Copy link
Contributor

@jackm321 jackm321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment about lack of ownership of tensors here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean here? The Placeholders are owned by module_. (maybe just write the comment you like here and i'll copy it in 😄)

@nickgg
Copy link
Contributor Author

nickgg commented Apr 30, 2019

found another unused propagate function to remove

@nickgg nickgg force-pushed the noCopyExecutor branch from c8bbfcd to e97d88f Compare May 1, 2019 01:08
@nickgg
Copy link
Contributor Author

nickgg commented May 1, 2019

change intermediatePlaceholders_ to a vector since we never look up by key.

@nickgg nickgg force-pushed the noCopyExecutor branch from e97d88f to 4a7fd48 Compare May 1, 2019 23:39
@nickgg nickgg merged commit d299b98 into pytorch:master May 1, 2019
@nadavrot
Copy link
Contributor

nadavrot commented May 2, 2019

@nickgg Congrats on landing this PR. This is great!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Runtime efficiency] Don't zero tensors unnecessarily

5 participants