Fix #114414: Drawing Artifacts Grayscale Byte Images #114441

Jeroen Bakker · 2023-11-03T09:08:05+01:00

Jeroen Bakker commented

2023-11-03 09:08:05 +01:00

Single channel JPEG files are stored in 4 channels image buffers, with their
bit planes set to 8.

When creating the GPU texture the number of bit planes is used to identify
if the data could be stored as a single channel texture, but in this case
the source data contains 4 channels and doesn't align with the GPU
texture format.

In Blender 3.6 the way how bit planes were stored changed (the planes
used to be 32 for the grayscale byte images matching the number of channels,
but now is set to 8, matching the original loaded image).

Since Blender 3.5 the grayscale byte images are uploaded using a single channel
GPU format, which leads to the mentioned artifacts. Grayscale byte buffers never
seems to have worked since its introduction in
https://archive.blender.org/developer/D15484

This PR disables using the grayscale GPU textures when it sourced from
a byte image.

Single channel JPEG files are stored in 4 channels image buffers, with their bit planes set to 8. When creating the GPU texture the number of bit planes is used to identify if the data could be stored as a single channel texture, but in this case the source data contains 4 channels and doesn't align with the GPU texture format. In Blender 3.6 the way how bit planes were stored changed (the planes used to be 32 for the grayscale byte images matching the number of channels, but now is set to 8, matching the original loaded image). Since Blender 3.5 the grayscale byte images are uploaded using a single channel GPU format, which leads to the mentioned artifacts. Grayscale byte buffers never seems to have worked since its introduction in https://archive.blender.org/developer/D15484 This PR disables using the grayscale GPU textures when it sourced from a byte image.

Jeroen Bakker added 1 commit 2023-11-03 09:08:19 +01:00

19c4bc6d05 Fix #114414 : Drawing Artifacts Single Channel JPEG Files.

Single channel JPEG files are stored internally as 4 channels, where
RGB shader the same value. A is fully opaque. The number of bit planes
of this image format is still set to 8. When creating the GPU texture
the number of bitplanes is used to identfy if the data could be stored
on the GPU as a single channel image. But in this case the data loaded
in the GPU texture would contain 4 channels.

This PR fixes the issue to also check the number of channels. Although
for rendering it would be better that the Image buffer would only
contain a single channel, it would require more testing

Jeroen Bakker added the

Module

EEVEE & Viewport

label 2023-11-03 09:08:30 +01:00

Jeroen Bakker requested review from Clément Foucault 2023-11-03 09:08:39 +01:00

Jeroen Bakker added this to the 4.0 milestone 2023-11-03 09:08:42 +01:00

Jeroen Bakker added this to the EEVEE & Viewport project 2023-11-03 09:08:51 +01:00

Jeroen Bakker self-assigned this 2023-11-03 09:08:57 +01:00

~~Jeroen Bakker referenced this pull request 2023-11-03 09:14:35 +01:00~~

Blender LTS: Maintenance Task 3.6 #109399

Sergey Sharybin commented

2023-11-03 09:46:43 +01:00

This seems a bit confusing to me. All byte images are internally stored as RGBA, and the expected format for BW images is planes=8 channels=4. This is the same for PNG, JPEG, and all other file formats which do 8bit per channel IO.

The change proposed here seems to disable the grayscale optimization for all 8bpp BW formats. Which is also saying: if there is no float buffer just do not consider the ibuf compatible with the grayscale codepath. Not sure this is really an intended behavior.

Also not sure if this optimization ever worked for 8bpp BW images?

This seems a bit confusing to me. All byte images are internally stored as RGBA, and the expected format for BW images is planes=8 channels=4. This is the same for PNG, JPEG, and all other file formats which do 8bit per channel IO. The change proposed here seems to disable the grayscale optimization for all 8bpp BW formats. Which is also saying: if there is no float buffer just do not consider the ibuf compatible with the grayscale codepath. Not sure this is really an intended behavior. Also not sure if this optimization ever worked for 8bpp BW images?

Jeroen Bakker commented

2023-11-03 10:43:46 +01:00

In Blender 3.5 these the test image is stored as 4 channels, 32 bit planes in the image buffer. So yes they were never considered to be BW images and used 4 channel texture formats on the GPU.

The original patch https://archive.blender.org/developer/D15484 was done for tiff/float images. and might lack the support for byte buffers at all; at that time the byte buffers where always 4 channels/32 bitplanes.

Yes, I agree that images never used this optimized code-path. And perhaps fixing the bitplanes made this issue visible.

So the actual fix is to create a tmp buffer, when imbuf is a bytebuffer and bitplanes is set to 8. And use that to upload the optimized version.

In Blender 3.5 these the test image is stored as 4 channels, 32 bit planes in the image buffer. So yes they were never considered to be BW images and used 4 channel texture formats on the GPU. The original patch https://archive.blender.org/developer/D15484 was done for tiff/float images. and might lack the support for byte buffers at all; at that time the byte buffers where always 4 channels/32 bitplanes. Yes, I agree that images never used this optimized code-path. And perhaps fixing the bitplanes made this issue visible. So the actual fix is to create a tmp buffer, when imbuf is a bytebuffer and bitplanes is set to 8. And use that to upload the optimized version.

Jeroen Bakker added 1 commit 2023-11-03 11:08:26 +01:00

f304f80065 Disable grayscale byte buffers

Jeroen Bakker changed title from ~~Fix #114414: Drawing Artifacts Single Channel JPEG Files.~~ to Fix #114414: Drawing Artifacts Gray Scale Byte Images

2023-11-03 11:18:05 +01:00

Jeroen Bakker changed title from ~~Fix #114414: Drawing Artifacts Gray Scale Byte Images~~ to Fix #114414: Drawing Artifacts Grayscale Byte Images

2023-11-03 11:18:12 +01:00

Clément Foucault requested changes 2023-11-03 11:30:45 +01:00

source/blender/imbuf/intern/util_gpu.cc Outdated

						
				@ -39,0 +33,4 @@

				  /* The float buffer could in theory be non-linear, but it only happens internally in sequencer.

				   * Can consider adding is linear check for float_buffer.color_space. */

				  return ibuf->channels == 1;

Clément Foucault commented

2023-11-03 11:30:42 +01:00

Double return?

Jeroen-Bakker marked this conversation as resolved

Jeroen Bakker added 1 commit 2023-11-03 11:30:57 +01:00

1feff0e1b2 Remove unreachable code

Jeroen Bakker commented

2023-11-03 11:31:44 +01:00

My computer broke, before I could push the removal of the second return :-(

Clément Foucault commented

2023-11-03 11:38:04 +01:00

Are we sure that this does not break any other cases where the color-space transfer function is affecting the chrominance?

Jeroen Bakker force-pushed fix/114414 from 32c124ff1a to e8f3c59e33

2023-11-03 11:57:17 +01:00

Compare

Clément Foucault requested changes 2023-11-03 12:11:25 +01:00

Clément Foucault left a comment

I think IMB_colormanagement_space_is_srgb(ibuf->byte_buffer.colorspace) made sense since this transform does not modify the chromaticity (all channels are processed independently and the same way).

I think `IMB_colormanagement_space_is_srgb(ibuf->byte_buffer.colorspace)` made sense since this transform does not modify the chromaticity (all channels are processed independently and the same way).

Jeroen Bakker added 1 commit 2023-11-03 12:34:37 +01:00

4a0eb7bc87 Added srgb check for chrominance check

Jeroen Bakker added 1 commit 2023-11-03 12:36:42 +01:00

b3c90c0e86 Added back comment.

Clément Foucault approved these changes 2023-11-03 12:58:45 +01:00

Jeroen Bakker merged commit 2193bd4b89 into blender-v4.0-release

2023-11-03 13:54:17 +01:00

Jeroen Bakker referenced this issue from a commit

2023-11-03 13:54:18 +01:00

Fix #114414: Drawing Artifacts Grayscale Byte Images

Michael Parkin-White commented

2023-11-07 17:57:51 +01:00

First-time contributor

Adding a note that this patch introduces a regression into viewport performance for EEVEE on Metal. This regression is around 5-10% depending on the scene, with the main hit being within the material shading pass, which can be 15-20% slower, depending on textures used.

For the Wanderer Scene on Apple M1 Pro:

v3.6 Mat pass: 13.48ms
v4.0 Mat pass: 15.34ms
v4.0 Mat pass (w/o this patch): 12.8ms

From a data standpoint, due to 4-channel textures being used in place of single-channel, we have a doubling in GPU read bandwidth during this pass (from 370MB to 700MB for this pass per frame). Texture cache utilisation therefore also goes down, which is where this performance change manifests.

Adding a note that this patch introduces a regression into viewport performance for EEVEE on Metal. This regression is around 5-10% depending on the scene, with the main hit being within the material shading pass, which can be 15-20% slower, depending on textures used. For the Wanderer Scene on Apple M1 Pro: v3.6 Mat pass: 13.48ms v4.0 Mat pass: 15.34ms v4.0 Mat pass (w/o this patch): 12.8ms From a data standpoint, due to 4-channel textures being used in place of single-channel, we have a doubling in GPU read bandwidth during this pass (from 370MB to 700MB for this pass per frame). Texture cache utilisation therefore also goes down, which is where this performance change manifests.

Michael Parkin-White referenced this pull request

2023-11-07 18:09:06 +01:00

Blender 4.0 Regression: 10% EEVEE Performance drop from #114441 #114590

Sergey Sharybin commented

2023-11-07 18:22:48 +01:00

@Michael-Parkin-White-Apple On a Blender side the 8bit textures are always stored as RGBA, even if they are stored as single channel on disk. Is it the fact that grayscale heuristic for byte images lead to the performance loss, or did the change accidentally changed behavior of float textures?

In the former case, being fast but incorrect is not necessarily good. And AFAIR the #114414 was reproducable on Mac. So, in this case to solve both correctness and speed there would need to be a way to upload RGBA image from CPU to single channel on GPU. I think OpenGL supported something like that, but i am not sure if Metal supports that.

If the latter case it should be an easy fix, but it is not immediatelly clear to me what exactly is wrong in the code.

@Michael-Parkin-White-Apple On a Blender side the 8bit textures are always stored as RGBA, even if they are stored as single channel on disk. Is it the fact that grayscale heuristic for byte images lead to the performance loss, or did the change accidentally changed behavior of float textures? In the former case, being fast but incorrect is not necessarily good. And AFAIR the #114414 was reproducable on Mac. So, in this case to solve both correctness and speed there would need to be a way to upload RGBA image from CPU to single channel on GPU. I think OpenGL supported something like that, but i am not sure if Metal supports that. If the latter case it should be an easy fix, but it is not immediatelly clear to me what exactly is wrong in the code.

Michael Parkin-White commented

2023-11-07 20:19:04 +01:00

First-time contributor

@Michael-Parkin-White-Apple On a Blender side the 8bit textures are always stored as RGBA, even if they are stored as single channel on disk. Is it the fact that grayscale heuristic for byte images lead to the performance loss, or did the change accidentally changed behavior of float textures?

While I have not investigated the full journey of the texture data, it would appear that in the GPU debugger the bound textures are initiated with R8Unorm in v3.6 (and with this patch disabled), whereas they appear to be instantiated as RGBA8Unorm for the GPU. This happens in imb_gpu_get_format wherein the format is decided based on the return from this function, which I believe now returns false for the cases where it may have returned true in the past:

      /* Non-color data or scene linear, just store buffer as is. */
      *r_data_format = GPU_DATA_UBYTE;
      *r_texture_format = (is_grayscale) ? GPU_R8 : GPU_RGBA8;

As far as data upload is concerned, I am not entirely sure how this worked before, as I guess the source data would have been mismatched.

In the former case, being fast but incorrect is not necessarily good. And AFAIR the #114414 was reproducable on Mac. So, in this case to solve both correctness and speed there would need to be a way to upload RGBA image from CPU to single channel on GPU. I think OpenGL supported something like that, but i am not sure if Metal supports that.

I agree it should not be incorrect, although it should be fairly straight-forward to modify the update routine to allow different component count in the source data vs the format data. At the moment, in all implementations of GPU_texture_update_sub(..), the component count is implicitly derived from the GPUTexture's format, and data size determined from the combination of the source format e.g. GPU_DATA_FLOAT and component length (with special cases for packed formats).

The Metal backend needed special compute implementations to support other possible combinations of texture update, so adding an optional parameter to override the number of components in input data would work fine. This would simply override the implicit data stride calculation, such that you could stride along 4 components of source data, while only writing to one component.

Technically, even without any special changes, as the actual update routine in Metal uses compute and texture.write which takes in a vec4/uint4/int4, writing the full components but having the backing format be R8 would just automatically discard the unused data.

If the latter case it should be an easy fix, but it is not immediatelly clear to me what exactly is wrong in the code.

You're reply has definitely cleared up the disparity, and why this wasn't working in the first place, but now that I better understand the setup, it may not be too difficult of a solution. Although naturally it is likely too late for 4.0, but I can look at getting a small patch together for 4.0.1.

Thanks for the input!

> @Michael-Parkin-White-Apple On a Blender side the 8bit textures are always stored as RGBA, even if they are stored as single channel on disk. Is it the fact that grayscale heuristic for byte images lead to the performance loss, or did the change accidentally changed behavior of float textures? While I have not investigated the full journey of the texture data, it would appear that in the GPU debugger the bound textures are initiated with R8Unorm in v3.6 (and with this patch disabled), whereas they appear to be instantiated as RGBA8Unorm for the GPU. This happens in `imb_gpu_get_format` wherein the format is decided based on the return from this function, which I believe now returns false for the cases where it may have returned true in the past: ``` /* Non-color data or scene linear, just store buffer as is. */ *r_data_format = GPU_DATA_UBYTE; *r_texture_format = (is_grayscale) ? GPU_R8 : GPU_RGBA8; ``` As far as data upload is concerned, I am not entirely sure how this worked before, as I guess the source data would have been mismatched. > In the former case, being fast but incorrect is not necessarily good. And AFAIR the #114414 was reproducable on Mac. So, in this case to solve both correctness and speed there would need to be a way to upload RGBA image from CPU to single channel on GPU. I think OpenGL supported something like that, but i am not sure if Metal supports that. I agree it should not be incorrect, although it should be fairly straight-forward to modify the update routine to allow different component count in the source data vs the format data. At the moment, in all implementations of `GPU_texture_update_sub(..)`, the component count is implicitly derived from the GPUTexture's format, and data size determined from the combination of the source format e.g. `GPU_DATA_FLOAT` and component length (with special cases for packed formats). The Metal backend needed special compute implementations to support other possible combinations of texture update, so adding an optional parameter to override the number of components in input data would work fine. This would simply override the implicit data stride calculation, such that you could stride along 4 components of source data, while only writing to one component. Technically, even without any special changes, as the actual update routine in Metal uses compute and texture.write which takes in a vec4/uint4/int4, writing the full components but having the backing format be R8 would just automatically discard the unused data. > If the latter case it should be an easy fix, but it is not immediatelly clear to me what exactly is wrong in the code. You're reply has definitely cleared up the disparity, and why this wasn't working in the first place, but now that I better understand the setup, it may not be too difficult of a solution. Although naturally it is likely too late for 4.0, but I can look at getting a small patch together for 4.0.1. Thanks for the input!

Sign in to join this conversation.

No reviewers

No Label

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Fix #114414: Drawing Artifacts Grayscale Byte Images #114441