-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Visualizing gradients tutorial (issue #3186) #3389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3389
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit cc1aa32 with merge base 2c4c99d ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Generally seems to be headed in the right direction in terms of tone and organization from my perspective. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the working on this tutorial. Overall I'd say though that this section (prior to the actual visualizing gradients part) can be much shorter.
By the end of this tutorial, you will be able to:
Differentiate between leaf and non-leaf tensors
have a diagram from https://github.com/szagoruyko/pytorchviz, point to the leafs
Know when to use\
retain_grad
vs. ``require_grad`
"use requires_grad for leaf, use retain_grad for non-leaf"
Thank you for the comments, they were really helpful. Let me know if you think the first section is still too long. Concerning the "visualizing gradients" section with an actual example, I'm not sure if I'm going about retaining the gradients for intermediate tensors correctly. My thought process was to use a forward hook, call Initially I tried using a backward pass hook like RuntimeError: Output 0 of BackwardHookFunctionBackward is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning the output of the custom Function. I know that I can plot the gradients for the parameters by just looping through the If anyone sees a problem with my method let me know. The current state of the code isn't doing what I expected so I still have to debug it. EDIT 1: I stumbled upon this issue. Perhaps it's better to switch to using tensor hooks as suggested by alban, instead of storing the outputs through a forward pass and then later accessing their .grad EDIT 2: I decided to not use ResNet but instead a simplified fully connected network as explained in the BatchNorm paper. It is purely for educative purposes, but it actually shows the results I was expecting. With the ResNet implementation, I believe that the residual connections and ReLU non-linearity are muddying the negative effect on the gradients if they don't have BatchNorm. I'll push an updated PR sometime today. |
Still a work in progress, but I significantly reduced the first section and added some helpful images for the computational graph. I also added links for most terms. The WIP section with ResNet I still have to debug. I'm not sure my method for retaining the intermediate gradients is valid. See discussion on pull request.
Instead of using resnet as the example for visualization of the gradients, I decided to use a simple fully-connected network with and without batchnorm. It is a contrived model, but the importance is on illustration of the gradients, not so much on which model to apply it for. I also wanted the positive effect of batch normalization to be clearly shown, and this was not the case with PyTorch's base resnet model.
0b9f56a
to
cc1aa32
Compare
Fixes #3186
Description
Add draft for visualizing gradients tutorial. Link is here but the content is old and the files need to be re-built.
Checklist