Skip to content

Conversation

@LexSong
Copy link

@LexSong LexSong commented Apr 1, 2025

This PR fixes two issues in the resize_image function:

  1. Make sure size parameters to be integers

    • The function sometimes got width and height as non-integer types, (like Tensor), which caused crashes when calling cv2.resize()
    • I added int() to make sure they’re always python integers. This stops the errors.
  2. Improve interpolation logic

    • The old code used 'lanczos' when shrinking images and 'area' when not, which was backwards.
    • Now, it uses 'area' only when the image shrinks a lot (to half size or smaller), since 'area' is better for big reductions. Otherwise, it uses 'lanczos'.

@kohya-ss
Copy link
Owner

kohya-ss commented Apr 2, 2025

Thank you for this PR! The 1. makes sense.

Regarding 2., after a quick comparison, I found that PIL's LANCZOS seems to produce almost the same results as CV2's AREA when downsampling.

Original image (512x512):
text

Resized with AREA (320x320):
text_320_area

Resized with LANCZOS4, PIL (320x320):
text_320_lanczos_pil

However, resizing with PIL is about 10 times slower because it involves type conversion.

Therefore, if the resized image size is smaller than the original size, it is sufficient to simply use CV2's INTER_AREA.

The original code had a mistake. It used 'lanczos' when the image got smaller (width > resized_width and height > resized_height) and 'area' when it stayed the same or got bigger. This was the wrong way. 'area' is better for big shrinking.
@LexSong LexSong force-pushed the fix-resize-issue branch from 8a5c313 to b822b7e Compare April 2, 2025 14:06
@LexSong
Copy link
Author

LexSong commented Apr 2, 2025

I tried with 90% downscale and there is no obvious quality difference. As you said, let's simply use AREA for all downscaling.

May I ask why we're using PIL's lanczos not CV2's lanczos?

@kohya-ss
Copy link
Owner

kohya-ss commented Apr 3, 2025

The update is done in #1426.

When upsapling128x128 to 328x328 (328 is an arbitrary choice),

CV2 LANCZOS:
text_328_lanczos

PIL LANCZOS:
text_328_lanczos_pil

For example, you can see four gray dots where the lines intersect in CV2 image.
image

@LexSong
Copy link
Author

LexSong commented Apr 3, 2025

I see, thanks for the reply. It's possible caused by gamma correction step. which I think it's skipped in opencv implementation.

@kohya-ss kohya-ss merged commit 606e687 into kohya-ss:sd3 Apr 5, 2025
2 checks passed
@LexSong LexSong deleted the fix-resize-issue branch April 5, 2025 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants