aggregating confusion matrices to calculate accuracy #17789
Unanswered
taloy42
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Goal
I want to calculate the confusion matrix$C_g$ on each gpu, add it all to $C=\sum C_g$ , use $C$ to calculate the accuracy and log it using
self.log_dict(accuracies_from_confmat(C))
Setup
Current Situation
right now I am using th following code:
Wanted Behaviour
I would like to do something like
so to add all the matrices into one from all the GPUs, and then log the data only on rank 0.
Attempts
I have tried to use
torch.distributed.all_reduce
but I have got a cuda memory errorBeta Was this translation helpful? Give feedback.
All reactions