Skip to content
This repository was archived by the owner on Jul 1, 2025. It is now read-only.

Conversation

@opti-mix
Copy link
Contributor

@opti-mix opti-mix commented Apr 10, 2019

This handles QuantizationProfileNode scheduling in a more general way, which is not dependent on the kind of the node. It schedules QuantizationProfile instructions right after the last node that updates their inputs to shorten lifetimes of buffers.

Also, take the opportunity to generalize the "SaveNode hack" in the scheduler and make it independent of the Node's kind.

Fixes #2697

@nadavrot
Copy link
Contributor

I wonder why the scheduler is not scheduling the profile nodes early and drags them all the way to the end of the functions.

@opti-mix
Copy link
Contributor Author

opti-mix commented Apr 10, 2019

I wonder why the scheduler is not scheduling the profile nodes early and drags them all the way to the end of the functions.

I looked at it. The thing is that profile nodes do not free any memory themselves. They have no outputs and all their inputs are @in or @inout. Since the scheduler tries to schedule first the nodes that free most memory and these nodes do not free anything, it postpones the scheduling of profile nodes for later.

@opti-mix opti-mix force-pushed the minor-improvements branch 2 times, most recently from 0ec4e68 to aa68eb7 Compare April 10, 2019 20:30
@nadavrot
Copy link
Contributor

Maybe we should change the way we calculate what freeing memory means? Because scheduling these nodes early will reduce memory pressure. No?

@opti-mix
Copy link
Contributor Author

Maybe we should change the way we calculate what freeing memory means? Because scheduling these nodes early will reduce memory pressure. No?

Maybe. I'll check if changing the scheduler would result in an easier solution.

@opti-mix
Copy link
Contributor Author

@nadavrot

Maybe we should change the way we calculate what freeing memory means? Because scheduling these nodes early will reduce memory pressure. No?

I added a second commit to this PR, which changes the scheduler. I leave both commits in this PR for now so that it is easier to compare both approaches.

@opti-mix opti-mix force-pushed the minor-improvements branch 2 times, most recently from c1f52cc to 690cd38 Compare April 11, 2019 06:40
@tlepley-cadence
Copy link
Contributor

I don't think I'll have time to give it a try today. I'll check and give it feedbacks tomorrow.
In general, I think the instruction scheduler is the right place to handle such memory pressure problematic. I also think that the code should be generic, not specific to the QuantizeProfile node: it's a about correctly handling in-out params.

@tlepley-cadence
Copy link
Contributor

tlepley-cadence commented Apr 12, 2019

I did some experiments with resnet50 and a batch size of 512.
I get the following memory requirement for the activation tensors memory-pool at profile generation time:

  • original version: 45.6 GB
  • scheduler and hoisting version: 2.5GB

This is a nice memory optimization ! :-)

Regarding the code itself, the scheduler version looks simpler and has my preference. Both versions nevertheless are specific to QuantizeProfile node. This is may be a pragmatic approach, but I think this could certainly be handled as a node-kind independent optimization problem.

@opti-mix opti-mix force-pushed the minor-improvements branch from 690cd38 to fd3ce7e Compare April 12, 2019 19:52
@opti-mix
Copy link
Contributor Author

I did some experiments with resnet50 and a batch size of 512.
I get the following memory requirement for the activation tensors memory-pool at profile generation time:
original version: 45.6 GB
scheduler and hoisting version: 2.5GB
This is a nice memory optimization ! :-)

Awesome! I'm glad it helps!

Regarding the code itself, the scheduler version looks simpler and has my preference. Both versions nevertheless are specific to QuantizeProfile node. This is may be a pragmatic approach, but I think this could certainly be handled as a node-kind independent optimization problem.

I'm proudly presenting the alternative number 3!!! :-D :-D I just pushed it. It is simpler than both previous attempts and it is also node-kind independent. Please give it a try. You can comment out the invocations of two others to see the effect of this one in isolation.

Copy link
Contributor

@nadavrot nadavrot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both option #2 and #3 look good to me. I don't have a preference. I added a few comments.

Thanks for doing this work @opti-mix . This is a complex problem and algorithm and it's great that you have a number of elegant ways of solving this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest mentioning the SaveNode and Profile* nodes as an example,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this here? We already checked that the node is not isSheduled(N);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment. We need to perform an extra isScheduled check here, because the code below may have scheduled the current node while scheduling its children.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would continue this commit with "because ...".

... because we don't want to extend the lifetime of this value for no reason. We want to execute and get rid of this node as soon as possible to reduce the memory pressure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... because nodes that have users can't be scheduled safely without violating dependencies.

@opti-mix opti-mix force-pushed the minor-improvements branch from fd3ce7e to 09f3b91 Compare April 15, 2019 20:33
@opti-mix
Copy link
Contributor Author

I personally like #3 most, as it is the smallest, least intrusive and more general solution. Waiting for @tlepley-cadence to confirm if its works for him.

@stale
Copy link

stale bot commented Apr 20, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@opti-mix
Copy link
Contributor Author

@tlepley-cadence Could you check if my last version produces the same good results?

@stale
Copy link

stale bot commented Apr 26, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@rdzhabarov
Copy link
Contributor

do not close this

@tlepley-cadence
Copy link
Contributor

@opti-mix Sorry for the delay. I just come back from holiday. I'll look at this tomorrow.

@tlepley-cadence
Copy link
Contributor

tlepley-cadence commented May 2, 2019

@opti-mix I finally did the check with the latest solution :)
It gives the same (good) results as the 2 earlier solution !
I also much prefer #3 since it's generic, independent from node types.

@opti-mix
Copy link
Contributor Author

opti-mix commented May 2, 2019

@tlepley-cadence

I finally did the check with the latest solution :)
It gives the same (good) results as the 2 earlier solution !

Yay! Finally! :-) Thanks for the confirmation!

I also much prefer #3 since it's generic, independent from node types.

OK. I'll land the solution #3 in the final version of this PR.

@stale
Copy link

stale bot commented May 7, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@opti-mix
Copy link
Contributor Author

opti-mix commented May 7, 2019

Will land soon. This week or next week.

…and which does not require any additional memory, schedule this user right after the current node

This handles QuantizationProfileNode scheduling in a more general way, which is not dependent on the kind of the node.

Also, take the opportunity to generalize the "SaveNode hack" in the scheduler and make it independent of the Node's kind.
@opti-mix opti-mix force-pushed the minor-improvements branch from 09f3b91 to 518b458 Compare May 10, 2019 18:00
Copy link

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@opti-mix has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@opti-mix opti-mix changed the title [ir-optimizer] Hoist QuantizationProfile instructions right after the last node that updates their inputs to shorten lifetimes of buffers Schedule QuantizationProfile instructions right after the last node that updates their inputs to shorten lifetimes of buffers May 10, 2019
@facebook-github-bot
Copy link

@opti-mix merged this pull request in 3abdf8d.

1 similar comment
@facebook-github-bot
Copy link

@opti-mix merged this pull request in 3abdf8d.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Profile quantization consumes too much memory

5 participants