[RFC] Add scriptable transforms

## TL;DR

I propose that we have a separate set of functional transforms that takes a tensors as input, and returns tensors, and it should be torchscript-able.

## Background

TorchVision currently relies on PIL for most of its transforms.
While reasonably fast and widely adopted, the use of an external library it makes our transforms impossible to be traceable / scriptable.

One of the biggest drawbacks of that is that pre-processing is generally a crucial part of reproducing a models' results, and different preprocessing (due to, e.g., OpenCV / PIL differences) can have an impact in the final model result.

By the time torchvision was initially developed, there were way fewer operations implemented on PyTorch that could be used to perform image transformations, such as resizing, rotations and affine warps.
It also creates a kind of weird situation where certain operations expect PIL Images, and others expect Torch Tensors (normalize is a notable case).

Since then, we have improved the support for image resizing in PyTorch (thanks to the upsample function), which supports a number of cases, as well as grid_sample, which enables us to do rotations, affine warpings and more in an efficient manner.

**Pros of using PyTorch ops**

* GPU support
* Batching supported
* Enables tracing the transforms
* autodiff support

**Cons of using PyTorch ops**

* Not bit-wise equivalent to PIL
* Some (but not many) cases are not yet supported

It should be noted that using PyTorch ops should not be a hard-constraints. This lets the users still implement their own functionalities by leveraging PIL or OpenCV. But only the transforms based on PyTorch will be able to be exported to torchscript.

This means that the lingua-franca of passing objects around in torchvision transforms would be a torch.Tensor, and not a PIL Image anymore.

## How to implement it

Most of the [transforms in torchvision](https://github.com/pytorch/vision/blob/master/torchvision/transforms/functional.py) can already be expressed with PyTorch native operators, like `torch.nn.functional.interpolate` or `torch.nn.functional.grid_sample`, so we should not need to write specialized ops for them in torchvision.

An initial PR adding support for video has been sent in https://github.com/pytorch/vision/pull/1353 , and I think we should improve on top of it to make it cover more ops, and also support images.

## Gotchas

Using torch operators has a drawback. It currently only supports batched tensors in NCHW format and floating point values, which is different than the format supported by our current set of transforms (HWC and `uint8` for most cases).

For now let's assume that the tensors are `float32` and in the NCHW format.  We might consider explicitly keeping a `memory_format=torch.channels_last` layout for compatibility (TBD)

Long-term, we should add support for `uint8` (and other integer types) to `interpolate` and make it more generic over which dimensions to interpolate https://github.com/pytorch/pytorch/issues/10482, but that's a larger task.

## List of transforms that could be readily available with PyTorch ops

- [x] normalize
- [x] resize (only nearest, bilinear and bicubic, for floating types)
- [x] pad (except symmetric pad)
- [x] crop
- [x] center_crop
- [x] resized_crop
- [x] hflip
- [x] vflip
- [x] five_crop
- [x] ten_crop
- [x] adjust_brightness
- [x] adjust_contrast
- [x] adjust_saturation
- [x] adjust_hue
- [x] adjust_gamma
- [x] rotate (only for nearest and bilinear, for floating types)
- [x] affine (only for nearest and bilinear, for floating types)
- [x] grayscale

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Add scriptable transforms #1375

TL;DR

Background

How to implement it

Gotchas

List of transforms that could be readily available with PyTorch ops

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Add scriptable transforms #1375

Description

TL;DR

Background

How to implement it

Gotchas

List of transforms that could be readily available with PyTorch ops

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions