Fix #114414: Drawing Artifacts Grayscale Byte Images #114441

Merged
Jeroen Bakker merged 6 commits from Jeroen-Bakker/blender:fix/114414 into blender-v4.0-release 2023-11-03 13:54:17 +01:00
Member

Single channel JPEG files are stored in 4 channels image buffers, with their
bit planes set to 8.

When creating the GPU texture the number of bit planes is used to identify
if the data could be stored as a single channel texture, but in this case
the source data contains 4 channels and doesn't align with the GPU
texture format.

In Blender 3.6 the way how bit planes were stored changed (the planes
used to be 32 for the grayscale byte images matching the number of channels,
but now is set to 8, matching the original loaded image).

Since Blender 3.5 the grayscale byte images are uploaded using a single channel
GPU format, which leads to the mentioned artifacts. Grayscale byte buffers never
seems to have worked since its introduction in
https://archive.blender.org/developer/D15484

This PR disables using the grayscale GPU textures when it sourced from
a byte image.

Single channel JPEG files are stored in 4 channels image buffers, with their bit planes set to 8. When creating the GPU texture the number of bit planes is used to identify if the data could be stored as a single channel texture, but in this case the source data contains 4 channels and doesn't align with the GPU texture format. In Blender 3.6 the way how bit planes were stored changed (the planes used to be 32 for the grayscale byte images matching the number of channels, but now is set to 8, matching the original loaded image). Since Blender 3.5 the grayscale byte images are uploaded using a single channel GPU format, which leads to the mentioned artifacts. Grayscale byte buffers never seems to have worked since its introduction in https://archive.blender.org/developer/D15484 This PR disables using the grayscale GPU textures when it sourced from a byte image.
Jeroen Bakker added 1 commit 2023-11-03 09:08:19 +01:00
19c4bc6d05 Fix #114414: Drawing Artifacts Single Channel JPEG Files.
Single channel JPEG files are stored internally as 4 channels, where
RGB shader the same value. A is fully opaque. The number of bit planes
of this image format is still set to 8. When creating the GPU texture
the number of bitplanes is used to identfy if the data could be stored
on the GPU as a single channel image. But in this case the data loaded
in the GPU texture would contain 4 channels.

This PR fixes the issue to also check the number of channels. Although
for rendering it would be better that the Image buffer would only
contain a single channel, it would require more testing
Jeroen Bakker added the
Module
EEVEE & Viewport
label 2023-11-03 09:08:30 +01:00
Jeroen Bakker requested review from Clément Foucault 2023-11-03 09:08:39 +01:00
Jeroen Bakker added this to the 4.0 milestone 2023-11-03 09:08:42 +01:00
Jeroen Bakker added this to the EEVEE & Viewport project 2023-11-03 09:08:51 +01:00
Jeroen Bakker self-assigned this 2023-11-03 09:08:57 +01:00

This seems a bit confusing to me. All byte images are internally stored as RGBA, and the expected format for BW images is planes=8 channels=4. This is the same for PNG, JPEG, and all other file formats which do 8bit per channel IO.

The change proposed here seems to disable the grayscale optimization for all 8bpp BW formats. Which is also saying: if there is no float buffer just do not consider the ibuf compatible with the grayscale codepath. Not sure this is really an intended behavior.

Also not sure if this optimization ever worked for 8bpp BW images?

This seems a bit confusing to me. All byte images are internally stored as RGBA, and the expected format for BW images is planes=8 channels=4. This is the same for PNG, JPEG, and all other file formats which do 8bit per channel IO. The change proposed here seems to disable the grayscale optimization for all 8bpp BW formats. Which is also saying: if there is no float buffer just do not consider the ibuf compatible with the grayscale codepath. Not sure this is really an intended behavior. Also not sure if this optimization ever worked for 8bpp BW images?
Author
Member

In Blender 3.5 these the test image is stored as 4 channels, 32 bit planes in the image buffer. So yes they were never considered to be BW images and used 4 channel texture formats on the GPU.

The original patch https://archive.blender.org/developer/D15484 was done for tiff/float images. and might lack the support for byte buffers at all; at that time the byte buffers where always 4 channels/32 bitplanes.

Yes, I agree that images never used this optimized code-path. And perhaps fixing the bitplanes made this issue visible.

So the actual fix is to create a tmp buffer, when imbuf is a bytebuffer and bitplanes is set to 8. And use that to upload the optimized version.

In Blender 3.5 these the test image is stored as 4 channels, 32 bit planes in the image buffer. So yes they were never considered to be BW images and used 4 channel texture formats on the GPU. The original patch https://archive.blender.org/developer/D15484 was done for tiff/float images. and might lack the support for byte buffers at all; at that time the byte buffers where always 4 channels/32 bitplanes. Yes, I agree that images never used this optimized code-path. And perhaps fixing the bitplanes made this issue visible. So the actual fix is to create a tmp buffer, when imbuf is a bytebuffer and bitplanes is set to 8. And use that to upload the optimized version.
Jeroen Bakker added 1 commit 2023-11-03 11:08:26 +01:00
Jeroen Bakker changed title from Fix #114414: Drawing Artifacts Single Channel JPEG Files. to Fix #114414: Drawing Artifacts Gray Scale Byte Images 2023-11-03 11:18:05 +01:00
Jeroen Bakker changed title from Fix #114414: Drawing Artifacts Gray Scale Byte Images to Fix #114414: Drawing Artifacts Grayscale Byte Images 2023-11-03 11:18:12 +01:00
Clément Foucault requested changes 2023-11-03 11:30:45 +01:00
@ -39,0 +33,4 @@
/* The float buffer could in theory be non-linear, but it only happens internally in sequencer.
* Can consider adding is linear check for float_buffer.color_space. */
return ibuf->channels == 1;

Double return?

Double return?
Jeroen-Bakker marked this conversation as resolved
Jeroen Bakker added 1 commit 2023-11-03 11:30:57 +01:00
Author
Member

My computer broke, before I could push the removal of the second return :-(

My computer broke, before I could push the removal of the second return :-(

Are we sure that this does not break any other cases where the color-space transfer function is affecting the chrominance?

Are we sure that this does not break any other cases where the color-space transfer function is affecting the chrominance?
Jeroen Bakker force-pushed fix/114414 from 32c124ff1a to e8f3c59e33 2023-11-03 11:57:17 +01:00 Compare
Clément Foucault requested changes 2023-11-03 12:11:25 +01:00
Clément Foucault left a comment
Member

I think IMB_colormanagement_space_is_srgb(ibuf->byte_buffer.colorspace) made sense since this transform does not modify the chromaticity (all channels are processed independently and the same way).

I think `IMB_colormanagement_space_is_srgb(ibuf->byte_buffer.colorspace)` made sense since this transform does not modify the chromaticity (all channels are processed independently and the same way).
Jeroen Bakker added 1 commit 2023-11-03 12:34:37 +01:00
Jeroen Bakker added 1 commit 2023-11-03 12:36:42 +01:00
Clément Foucault approved these changes 2023-11-03 12:58:45 +01:00
Jeroen Bakker merged commit 2193bd4b89 into blender-v4.0-release 2023-11-03 13:54:17 +01:00
First-time contributor

Adding a note that this patch introduces a regression into viewport performance for EEVEE on Metal. This regression is around 5-10% depending on the scene, with the main hit being within the material shading pass, which can be 15-20% slower, depending on textures used.

For the Wanderer Scene on Apple M1 Pro:

v3.6 Mat pass: 13.48ms
v4.0 Mat pass: 15.34ms
v4.0 Mat pass (w/o this patch): 12.8ms

From a data standpoint, due to 4-channel textures being used in place of single-channel, we have a doubling in GPU read bandwidth during this pass (from 370MB to 700MB for this pass per frame). Texture cache utilisation therefore also goes down, which is where this performance change manifests.

Adding a note that this patch introduces a regression into viewport performance for EEVEE on Metal. This regression is around 5-10% depending on the scene, with the main hit being within the material shading pass, which can be 15-20% slower, depending on textures used. For the Wanderer Scene on Apple M1 Pro: v3.6 Mat pass: 13.48ms v4.0 Mat pass: 15.34ms v4.0 Mat pass (w/o this patch): 12.8ms From a data standpoint, due to 4-channel textures being used in place of single-channel, we have a doubling in GPU read bandwidth during this pass (from 370MB to 700MB for this pass per frame). Texture cache utilisation therefore also goes down, which is where this performance change manifests.

@Michael-Parkin-White-Apple On a Blender side the 8bit textures are always stored as RGBA, even if they are stored as single channel on disk. Is it the fact that grayscale heuristic for byte images lead to the performance loss, or did the change accidentally changed behavior of float textures?

In the former case, being fast but incorrect is not necessarily good. And AFAIR the #114414 was reproducable on Mac. So, in this case to solve both correctness and speed there would need to be a way to upload RGBA image from CPU to single channel on GPU. I think OpenGL supported something like that, but i am not sure if Metal supports that.

If the latter case it should be an easy fix, but it is not immediatelly clear to me what exactly is wrong in the code.

@Michael-Parkin-White-Apple On a Blender side the 8bit textures are always stored as RGBA, even if they are stored as single channel on disk. Is it the fact that grayscale heuristic for byte images lead to the performance loss, or did the change accidentally changed behavior of float textures? In the former case, being fast but incorrect is not necessarily good. And AFAIR the #114414 was reproducable on Mac. So, in this case to solve both correctness and speed there would need to be a way to upload RGBA image from CPU to single channel on GPU. I think OpenGL supported something like that, but i am not sure if Metal supports that. If the latter case it should be an easy fix, but it is not immediatelly clear to me what exactly is wrong in the code.
First-time contributor

@Michael-Parkin-White-Apple On a Blender side the 8bit textures are always stored as RGBA, even if they are stored as single channel on disk. Is it the fact that grayscale heuristic for byte images lead to the performance loss, or did the change accidentally changed behavior of float textures?

While I have not investigated the full journey of the texture data, it would appear that in the GPU debugger the bound textures are initiated with R8Unorm in v3.6 (and with this patch disabled), whereas they appear to be instantiated as RGBA8Unorm for the GPU. This happens in imb_gpu_get_format wherein the format is decided based on the return from this function, which I believe now returns false for the cases where it may have returned true in the past:

      /* Non-color data or scene linear, just store buffer as is. */
      *r_data_format = GPU_DATA_UBYTE;
      *r_texture_format = (is_grayscale) ? GPU_R8 : GPU_RGBA8;

As far as data upload is concerned, I am not entirely sure how this worked before, as I guess the source data would have been mismatched.

In the former case, being fast but incorrect is not necessarily good. And AFAIR the #114414 was reproducable on Mac. So, in this case to solve both correctness and speed there would need to be a way to upload RGBA image from CPU to single channel on GPU. I think OpenGL supported something like that, but i am not sure if Metal supports that.

I agree it should not be incorrect, although it should be fairly straight-forward to modify the update routine to allow different component count in the source data vs the format data. At the moment, in all implementations of GPU_texture_update_sub(..), the component count is implicitly derived from the GPUTexture's format, and data size determined from the combination of the source format e.g. GPU_DATA_FLOAT and component length (with special cases for packed formats).

The Metal backend needed special compute implementations to support other possible combinations of texture update, so adding an optional parameter to override the number of components in input data would work fine. This would simply override the implicit data stride calculation, such that you could stride along 4 components of source data, while only writing to one component.

Technically, even without any special changes, as the actual update routine in Metal uses compute and texture.write which takes in a vec4/uint4/int4, writing the full components but having the backing format be R8 would just automatically discard the unused data.

If the latter case it should be an easy fix, but it is not immediatelly clear to me what exactly is wrong in the code.

You're reply has definitely cleared up the disparity, and why this wasn't working in the first place, but now that I better understand the setup, it may not be too difficult of a solution. Although naturally it is likely too late for 4.0, but I can look at getting a small patch together for 4.0.1.

Thanks for the input!

> @Michael-Parkin-White-Apple On a Blender side the 8bit textures are always stored as RGBA, even if they are stored as single channel on disk. Is it the fact that grayscale heuristic for byte images lead to the performance loss, or did the change accidentally changed behavior of float textures? While I have not investigated the full journey of the texture data, it would appear that in the GPU debugger the bound textures are initiated with R8Unorm in v3.6 (and with this patch disabled), whereas they appear to be instantiated as RGBA8Unorm for the GPU. This happens in `imb_gpu_get_format` wherein the format is decided based on the return from this function, which I believe now returns false for the cases where it may have returned true in the past: ``` /* Non-color data or scene linear, just store buffer as is. */ *r_data_format = GPU_DATA_UBYTE; *r_texture_format = (is_grayscale) ? GPU_R8 : GPU_RGBA8; ``` As far as data upload is concerned, I am not entirely sure how this worked before, as I guess the source data would have been mismatched. > In the former case, being fast but incorrect is not necessarily good. And AFAIR the #114414 was reproducable on Mac. So, in this case to solve both correctness and speed there would need to be a way to upload RGBA image from CPU to single channel on GPU. I think OpenGL supported something like that, but i am not sure if Metal supports that. I agree it should not be incorrect, although it should be fairly straight-forward to modify the update routine to allow different component count in the source data vs the format data. At the moment, in all implementations of `GPU_texture_update_sub(..)`, the component count is implicitly derived from the GPUTexture's format, and data size determined from the combination of the source format e.g. `GPU_DATA_FLOAT` and component length (with special cases for packed formats). The Metal backend needed special compute implementations to support other possible combinations of texture update, so adding an optional parameter to override the number of components in input data would work fine. This would simply override the implicit data stride calculation, such that you could stride along 4 components of source data, while only writing to one component. Technically, even without any special changes, as the actual update routine in Metal uses compute and texture.write which takes in a vec4/uint4/int4, writing the full components but having the backing format be R8 would just automatically discard the unused data. > If the latter case it should be an easy fix, but it is not immediatelly clear to me what exactly is wrong in the code. You're reply has definitely cleared up the disparity, and why this wasn't working in the first place, but now that I better understand the setup, it may not be too difficult of a solution. Although naturally it is likely too late for 4.0, but I can look at getting a small patch together for 4.0.1. Thanks for the input!
Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser Project (Legacy)
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#114441
No description provided.