Closed
Description
A minimal version of jpeg decoding on GPUs was implemented in #3792. Here's a list of potential future improvements:
- Support for A100 devices
- Support for batch decoding (I didn't see any speed improvement in my experiments in [WIP] nvJPEG support #2786 (comment), but perhaps I missed something)
- Use a finer-grained API for the decoding phases, and potentially change the decoding backend depending on the image size, taking inspiration from https://github.com/NVIDIA/CUDALibrarySamples/tree/master/nvJPEG/nvJPEG-Decoder-MultipleInstances
- As per Support for decoding jpegs on GPU with nvjpeg #3792 (comment), we could:
- Avoid creating tensor views and use some pointer arithmetic
- investigate whether the layout (CHW vs HWC) has an impact on performance