Open
Description
All the performance benchmarks that did so far for transforms v1 vs. v2 were on contiguous inputs. However, we have a few kernels that leave the output in a noncontiguous state:
affine_image_tensor
in caseimage.numel() > 0 and image.ndim == 4 and fill is not None
convert_color_space
in case we only strip the alpha channel, i.e.RGB_ALPHA -> RGB
andGRAY_ALPHA -> ALPHA
rotate_image_tensor
in caseimage.numel() > 0 and image.ndim == 4 and fill is not None
crop_image_tensor
center_crop_image_tensor
five_crop_image_tensor
ten_crop_image_tensor
If applicable, the same is also valid for the *_mask
and *_video
kernels since they are thin wrappers around the *_image_tensor
ones.
We should benchmark at least for a few kernels whether noncontiguous inputs cause a performance degredation that is larger than enforcing contiguous outputs on the kernels above. If so we should probably enforce contiguous outputs of our kernels.