Schedule QuantizationProfile instructions right after the last node that updates their inputs to shorten lifetimes of buffers #2698

opti-mix · 2019-04-10T18:14:05Z

This handles QuantizationProfileNode scheduling in a more general way, which is not dependent on the kind of the node. It schedules QuantizationProfile instructions right after the last node that updates their inputs to shorten lifetimes of buffers.

Also, take the opportunity to generalize the "SaveNode hack" in the scheduler and make it independent of the Node's kind.

Fixes #2697

nadavrot · 2019-04-10T18:54:46Z

I wonder why the scheduler is not scheduling the profile nodes early and drags them all the way to the end of the functions.

opti-mix · 2019-04-10T19:02:13Z

I wonder why the scheduler is not scheduling the profile nodes early and drags them all the way to the end of the functions.

I looked at it. The thing is that profile nodes do not free any memory themselves. They have no outputs and all their inputs are @in or @inout. Since the scheduler tries to schedule first the nodes that free most memory and these nodes do not free anything, it postpones the scheduling of profile nodes for later.

nadavrot · 2019-04-11T01:02:08Z

Maybe we should change the way we calculate what freeing memory means? Because scheduling these nodes early will reduce memory pressure. No?

opti-mix · 2019-04-11T01:41:32Z

Maybe we should change the way we calculate what freeing memory means? Because scheduling these nodes early will reduce memory pressure. No?

Maybe. I'll check if changing the scheduler would result in an easier solution.

opti-mix · 2019-04-11T05:41:29Z

@nadavrot

Maybe we should change the way we calculate what freeing memory means? Because scheduling these nodes early will reduce memory pressure. No?

I added a second commit to this PR, which changes the scheduler. I leave both commits in this PR for now so that it is easier to compare both approaches.

tlepley-cadence · 2019-04-11T15:47:26Z

I don't think I'll have time to give it a try today. I'll check and give it feedbacks tomorrow.
In general, I think the instruction scheduler is the right place to handle such memory pressure problematic. I also think that the code should be generic, not specific to the QuantizeProfile node: it's a about correctly handling in-out params.

tlepley-cadence · 2019-04-12T16:29:26Z

I did some experiments with resnet50 and a batch size of 512.
I get the following memory requirement for the activation tensors memory-pool at profile generation time:

original version: 45.6 GB
scheduler and hoisting version: 2.5GB

This is a nice memory optimization ! :-)

Regarding the code itself, the scheduler version looks simpler and has my preference. Both versions nevertheless are specific to QuantizeProfile node. This is may be a pragmatic approach, but I think this could certainly be handled as a node-kind independent optimization problem.

opti-mix · 2019-04-12T19:55:26Z

I did some experiments with resnet50 and a batch size of 512.
I get the following memory requirement for the activation tensors memory-pool at profile generation time:
original version: 45.6 GB
scheduler and hoisting version: 2.5GB
This is a nice memory optimization ! :-)

Awesome! I'm glad it helps!

Regarding the code itself, the scheduler version looks simpler and has my preference. Both versions nevertheless are specific to QuantizeProfile node. This is may be a pragmatic approach, but I think this could certainly be handled as a node-kind independent optimization problem.

I'm proudly presenting the alternative number 3!!! :-D :-D I just pushed it. It is simpler than both previous attempts and it is also node-kind independent. Please give it a try. You can comment out the invocations of two others to see the effect of this one in isolation.

nadavrot

Both option #2 and #3 look good to me. I don't have a preference. I added a few comments.

Thanks for doing this work @opti-mix . This is a complex problem and algorithm and it's great that you have a number of elegant ways of solving this.

nadavrot · 2019-04-15T20:10:02Z

lib/IR/ChildMemSizeBasedScheduler.cpp

I suggest mentioning the SaveNode and Profile* nodes as an example,

nadavrot · 2019-04-15T20:11:11Z

lib/IR/ChildMemSizeBasedScheduler.cpp

Why do we need this here? We already checked that the node is not isSheduled(N);

Added a comment. We need to perform an extra isScheduled check here, because the code below may have scheduled the current node while scheduling its children.

nadavrot · 2019-04-15T20:13:07Z

lib/IR/ChildMemSizeBasedScheduler.cpp

I would continue this commit with "because ...".

... because we don't want to extend the lifetime of this value for no reason. We want to execute and get rid of this node as soon as possible to reduce the memory pressure.

nadavrot · 2019-04-15T20:13:44Z

lib/IR/ChildMemSizeBasedScheduler.cpp

... because nodes that have users can't be scheduled safely without violating dependencies.

opti-mix · 2019-04-15T20:34:51Z

I personally like #3 most, as it is the smallest, least intrusive and more general solution. Waiting for @tlepley-cadence to confirm if its works for him.

stale · 2019-04-20T21:01:13Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

opti-mix · 2019-04-21T03:42:42Z

@tlepley-cadence Could you check if my last version produces the same good results?

stale · 2019-04-26T04:14:40Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

rdzhabarov · 2019-04-26T04:18:48Z

do not close this

tlepley-cadence · 2019-04-29T16:15:23Z

@opti-mix Sorry for the delay. I just come back from holiday. I'll look at this tomorrow.

tlepley-cadence · 2019-05-02T15:00:19Z

@opti-mix I finally did the check with the latest solution :)
It gives the same (good) results as the 2 earlier solution !
I also much prefer #3 since it's generic, independent from node types.

opti-mix · 2019-05-02T15:33:51Z

@tlepley-cadence

I finally did the check with the latest solution :)
It gives the same (good) results as the 2 earlier solution !

Yay! Finally! :-) Thanks for the confirmation!

I also much prefer #3 since it's generic, independent from node types.

OK. I'll land the solution #3 in the final version of this PR.

stale · 2019-05-07T16:26:07Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

opti-mix · 2019-05-07T16:36:40Z

Will land soon. This week or next week.

…and which does not require any additional memory, schedule this user right after the current node This handles QuantizationProfileNode scheduling in a more general way, which is not dependent on the kind of the node. Also, take the opportunity to generalize the "SaveNode hack" in the scheduler and make it independent of the Node's kind.

facebook-github-bot

@opti-mix has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-05-10T22:02:03Z

@opti-mix merged this pull request in 3abdf8d.

facebook-github-bot · 2019-05-10T22:02:14Z

@opti-mix merged this pull request in 3abdf8d.

facebook-github-bot added the CLA Signed label Apr 10, 2019

opti-mix mentioned this pull request Apr 10, 2019

Profile quantization consumes too much memory #2697

Closed

opti-mix force-pushed the minor-improvements branch 2 times, most recently from 0ec4e68 to aa68eb7 Compare April 10, 2019 20:30

opti-mix force-pushed the minor-improvements branch 2 times, most recently from c1f52cc to 690cd38 Compare April 11, 2019 06:40

opti-mix force-pushed the minor-improvements branch from 690cd38 to fd3ce7e Compare April 12, 2019 19:52

nadavrot approved these changes Apr 15, 2019

View reviewed changes

opti-mix force-pushed the minor-improvements branch from fd3ce7e to 09f3b91 Compare April 15, 2019 20:33

stale bot added the stale_will_be_closed label Apr 20, 2019

stale bot removed the stale_will_be_closed label Apr 21, 2019

stale bot added the stale_will_be_closed label Apr 26, 2019

stale bot removed the stale_will_be_closed label Apr 26, 2019

stale bot added the stale_will_be_closed label May 7, 2019

stale bot removed the stale_will_be_closed label May 7, 2019

opti-mix force-pushed the minor-improvements branch from 09f3b91 to 518b458 Compare May 10, 2019 18:00

facebook-github-bot reviewed May 10, 2019

View reviewed changes

facebook-github-bot closed this in 3abdf8d May 10, 2019

facebook-github-bot added the Merged label May 10, 2019

Schedule QuantizationProfile instructions right after the last node that updates their inputs to shorten lifetimes of buffers #2698

Schedule QuantizationProfile instructions right after the last node that updates their inputs to shorten lifetimes of buffers #2698

Uh oh!

Conversation

opti-mix commented Apr 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nadavrot commented Apr 10, 2019

Uh oh!

opti-mix commented Apr 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nadavrot commented Apr 11, 2019

Uh oh!

opti-mix commented Apr 11, 2019

Uh oh!

opti-mix commented Apr 11, 2019

Uh oh!

tlepley-cadence commented Apr 11, 2019

Uh oh!

tlepley-cadence commented Apr 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

opti-mix commented Apr 12, 2019

Uh oh!

nadavrot left a comment

Choose a reason for hiding this comment

Uh oh!

nadavrot Apr 15, 2019

Choose a reason for hiding this comment

Uh oh!

nadavrot Apr 15, 2019

Choose a reason for hiding this comment

Uh oh!

opti-mix Apr 15, 2019

Choose a reason for hiding this comment

Uh oh!

nadavrot Apr 15, 2019

Choose a reason for hiding this comment

Uh oh!

nadavrot Apr 15, 2019

Choose a reason for hiding this comment

Uh oh!

opti-mix commented Apr 15, 2019

Uh oh!

stale bot commented Apr 20, 2019

Uh oh!

opti-mix commented Apr 21, 2019

Uh oh!

stale bot commented Apr 26, 2019

Uh oh!

rdzhabarov commented Apr 26, 2019

Uh oh!

tlepley-cadence commented Apr 29, 2019

Uh oh!

tlepley-cadence commented May 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

opti-mix commented May 2, 2019

Uh oh!

stale bot commented May 7, 2019

Uh oh!

opti-mix commented May 7, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented May 10, 2019

Uh oh!

facebook-github-bot commented May 10, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

opti-mix commented Apr 10, 2019 •

edited

Loading

opti-mix commented Apr 10, 2019 •

edited

Loading

tlepley-cadence commented Apr 12, 2019 •

edited

Loading

tlepley-cadence commented May 2, 2019 •

edited

Loading