Compositor: Unify sRGB to Linear between CPU and GPU #118624

Merged
Thomas Dinges merged 13 commits from OmarEmaraDev/blender:unify-compositor-srgb-linear into main 2024-03-25 14:10:05 +01:00
Member

This patch unifies the sRGB to Linear color space conversion between the
CPU and GPU compositors. This is because CPU uses an optimized path that
produces values that are very slightly off. To fix this, for the GPU, we
do the conversion CPU side instead of doing it in a shader. Since images
are cached, the performance implications are not significant.

Another added benefit is that we no longer get differences due to the
order of alpha pre-multiplication and sRGB conversion, demonstrated in
#114305. And we no longer require any preprocessing of the images.

This patch adds some new utilities to the Image Buffer module to assign
float, byte, and compressed buffers along with their color spaces. It
also adds an ownership flag to compressed data. Those were added as a
way to facilitate the implementation.

This patch unifies the sRGB to Linear color space conversion between the CPU and GPU compositors. This is because CPU uses an optimized path that produces values that are very slightly off. To fix this, for the GPU, we do the conversion CPU side instead of doing it in a shader. Since images are cached, the performance implications are not significant. Another added benefit is that we no longer get differences due to the order of alpha pre-multiplication and sRGB conversion, demonstrated in #114305. And we no longer require any preprocessing of the images. This patch adds some new utilities to the Image Buffer module to assign float, byte, and compressed buffers along with their color spaces. It also adds an ownership flag to compressed data. Those were added as a way to facilitate the implementation.
Omar Emara added the
Interest
Compositing
Module
VFX & Video
labels 2024-02-22 15:53:13 +01:00
Omar Emara added 1 commit 2024-02-22 15:53:18 +01:00
3b32e96fa9 Compositor: Unify sRGB to Linear between CPU and GPU
This patch unifies the sRGB to Linear color space conversion between the
CPU and GPU compositors. This is because CPU uses an optimized path that
produce values that are very slightly off. To fix this, for the GPU, we
do the conversion CPU side instead of doing it in a shader. Since images
are cached, the performance implications are not significant.

Another added benefit is that we no longer get differences due to the
order of alpha pre-multiplication and sRGB conversion.
Omar Emara requested review from Sergey Sharybin 2024-02-22 15:53:34 +01:00
Omar Emara changed title from Compositor: Unify sRGB to Linear between CPU and GPU to WIP: Compositor: Unify sRGB to Linear between CPU and GPU 2024-02-22 15:53:43 +01:00
Omar Emara added 1 commit 2024-02-23 10:35:58 +01:00
Omar Emara changed title from WIP: Compositor: Unify sRGB to Linear between CPU and GPU to Compositor: Unify sRGB to Linear between CPU and GPU 2024-02-23 10:40:10 +01:00
Omar Emara added 1 commit 2024-02-23 20:28:25 +01:00

I am not really sure this is the best approach, but also can't provide thoughts of what I believe is the way to go at this time.

There are couple of issues with this approach:

  • It increases RAM (and possibly VRAM as well?) usage for all byte images.
  • It does not address possible difference of results on different CPU architectures: the x86 and arm64 provide different accuracy of conversion (due to one being SIMD-optimizated with not-so-many Newton steps, and the other not being SIMD optimized at all).

Do we know whether such difference causes issues on real-life setups, or it is only affecting the regression tests which "look" at matte of nodes which are sensitive to exact precision?

I am not really sure this is the best approach, but also can't provide thoughts of what I believe is the way to go at this time. There are couple of issues with this approach: - It increases RAM (and possibly VRAM as well?) usage for all byte images. - It does not address possible difference of results on different CPU architectures: the x86 and arm64 provide different accuracy of conversion (due to one being SIMD-optimizated with not-so-many Newton steps, and the other not being SIMD optimized at all). Do we know whether such difference causes issues on real-life setups, or it is only affecting the regression tests which "look" at matte of nodes which are sensitive to exact precision?
Author
Member

@OmarEmaraDev I am not sure about the RAM/VRAM point, because this is a cached resource where the extra RAM usage is temporary and immediately freed when computing the resource, and never consumed again until the cached is invalidated. As for VRAM, this actually consumes less VRAM, because byte sRGB textures had to be preprocessed anyways for alpha premultiplication and linear space conversion, which is eventually saved in a half float texture. While we skip the byte sRGB texture allocation in this pull request.

I have no idea what we need to do regarding the x86 vs ARM difference. But the CPU vs GPU is arguably more serious, since it is visible to the user in the same computer and session.

I am not sure what we would consider as a real-life difference. But the matte difference can be reproduced using a single Matte node as can be seen in the regression test files, which seems to be a real life example where that's precisely what the nodes are made for.
Moreover, any operation that uses thresholding, does operations in other color spaces, or evaluates the image at a curve map (Like RGB Curves, Hue Correct, or Color Ramp) would also produce a visible difference.

@OmarEmaraDev I am not sure about the RAM/VRAM point, because this is a cached resource where the extra RAM usage is temporary and immediately freed when computing the resource, and never consumed again until the cached is invalidated. As for VRAM, this actually consumes less VRAM, because byte sRGB textures had to be preprocessed anyways for alpha premultiplication and linear space conversion, which is eventually saved in a half float texture. While we skip the byte sRGB texture allocation in this pull request. I have no idea what we need to do regarding the x86 vs ARM difference. But the CPU vs GPU is arguably more serious, since it is visible to the user in the same computer and session. I am not sure what we would consider as a real-life difference. But the matte difference can be reproduced using a single Matte node as can be seen in the regression test files, which seems to be a real life example where that's precisely what the nodes are made for. Moreover, any operation that uses thresholding, does operations in other color spaces, or evaluates the image at a curve map (Like RGB Curves, Hue Correct, or Color Ramp) would also produce a visible difference.
Sergey Sharybin added this to the Compositing project 2024-02-27 10:13:25 +01:00
Sergey Sharybin requested changes 2024-03-05 16:27:44 +01:00
Dismissed
Sergey Sharybin left a comment
Owner

I had quite some thoughts about it. It would be nice if we could somehow easily and quickly stream textures in an out, which with this approach seems to bit more tricky due to the CPU side "overhead". But then on another hand, the simplified GPU transforms and lowered VRAM consumption is quite attractive. And finally, it is probably good idea to use half-float EXRs as inputs for proper compositing, and those use-cases are not affected by this change. So without having any better solution in mind, lets go with this one.

However, there is one thing which would be nice to resolve: don't modify the acquired imbuf in-place, but instead create a temporary one which references the data buffer from the original, and do conversion that way. This will avoid possible threading conflict with other threads accessing the same imbuf, and making some decisions about using float buffer, which might be removed by the compositor thread.

I had quite some thoughts about it. It would be nice if we could somehow easily and quickly stream textures in an out, which with this approach seems to bit more tricky due to the CPU side "overhead". But then on another hand, the simplified GPU transforms and lowered VRAM consumption is quite attractive. And finally, it is probably good idea to use half-float EXRs as inputs for proper compositing, and those use-cases are not affected by this change. So without having any better solution in mind, lets go with this one. However, there is one thing which would be nice to resolve: don't modify the acquired imbuf in-place, but instead create a temporary one which references the data buffer from the original, and do conversion that way. This will avoid possible threading conflict with other threads accessing the same imbuf, and making some decisions about using float buffer, which might be removed by the compositor thread.
Author
Member

@Sergey But isn't the image buffer protected using the cache mutex? So it seems like conflicts are not really possible.

@Sergey But isn't the image buffer protected using the cache mutex? So it seems like conflicts are not really possible.

@OmarEmaraDev The image is only protected during BKE_image_acquire_ibuf() and during BKE_image_release_ibuf(), so that multiple threads can access image buffers of the same image. The image buffer itself is not protected, so if threads modify the image buffer, it needs to be done with care.

P.S. There is a special case of accessing viewer type of images, such as Render Result. Those do acquire global lock, but they need to receive an extra lock argument to the acquire/release functions.

@OmarEmaraDev The image is only protected during `BKE_image_acquire_ibuf()` and during `BKE_image_release_ibuf()`, so that multiple threads can access image buffers of the same image. The image buffer itself is not protected, so if threads modify the image buffer, it needs to be done with care. P.S. There is a special case of accessing viewer type of images, such as Render Result. Those do acquire global lock, but they need to receive an extra `lock` argument to the acquire/release functions.
Omar Emara added 3 commits 2024-03-08 14:34:38 +01:00
Omar Emara added 1 commit 2024-03-08 14:36:14 +01:00
Sergey Sharybin reviewed 2024-03-13 15:58:46 +01:00
Sergey Sharybin left a comment
Owner

Some comment on the public API changes in the IMB.
The compositor side changes seems to be good.

Some comment on the public API changes in the IMB. The compositor side changes seems to be good.
@ -117,6 +117,11 @@ ImBuf *IMB_allocFromBuffer(const uint8_t *byte_buffer,
*/
void IMB_assign_byte_buffer(ImBuf *ibuf, uint8_t *buffer_data, ImBufOwnership ownership);
void IMB_assign_float_buffer(ImBuf *ibuf, float *buffer_data, ImBufOwnership ownership);
void IMB_assign_encoded_buffer(ImBuf *ibuf,

I wouldn't make it public function, and instead have imb_assign_encoded_buffer next to the imb_addencodedbufferImBuf in the IMB_allocimbuf.h

I wouldn't make it public function, and instead have `imb_assign_encoded_buffer` next to the `imb_addencodedbufferImBuf` in the `IMB_allocimbuf.h`
OmarEmaraDev marked this conversation as resolved
@ -146,1 +151,3 @@
ImBuf *IMB_dupImBuf(const ImBuf *ibuf1);
/* Creates a copy of the given buffer. If the shallow arguement is false, the internals buffers
* will be copied as well, if true, the buffers will be shared using IB_DO_NOT_TAKE_OWNERSHIP. */
ImBuf *IMB_dupImBuf(const ImBuf *ibuf1, bool shallow = false);

I think it will be more clear to have a separate function IMB_copy_sharing than to have a boolean argument, as it will help local readability: ImBuf *my_ibuf = IMB_dupImBuf(ibuf, true) vs. ImBuf *my_ibuf = IMB_copy_sharing(ibuf, true).

Also, as you might have noticed, to me it feels better to avoid shallow, to avoid possible confusion with shallow DNA copy, which is quite a different thing.

I think it will be more clear to have a separate function `IMB_copy_sharing` than to have a boolean argument, as it will help local readability: `ImBuf *my_ibuf = IMB_dupImBuf(ibuf, true)` vs. `ImBuf *my_ibuf = IMB_copy_sharing(ibuf, true)`. Also, as you might have noticed, to me it feels better to avoid `shallow`, to avoid possible confusion with shallow DNA copy, which is quite a different thing.
OmarEmaraDev marked this conversation as resolved
Omar Emara added 2 commits 2024-03-14 10:42:40 +01:00
buildbot/vexp-code-patch-darwin-x86_64 Build done. Details
buildbot/vexp-code-patch-linux-x86_64 Build done. Details
buildbot/vexp-code-patch-lint Build done. Details
buildbot/vexp-code-patch-windows-amd64 Build done. Details
buildbot/vexp-code-patch-darwin-arm64 Build done. Details
buildbot/vexp-code-patch-coordinator Build done. Details
913fef06d6
Address review
Author
Member

@blender-bot build

@blender-bot build
Sergey Sharybin reviewed 2024-03-14 12:44:14 +01:00
Sergey Sharybin left a comment
Owner

Thanks for the update, it really helped to better see what code paths are involved into the new functionality. Unfortunately, it also made it apparent some real issues around the ImIBuf::gpu:

  • It can currently does not support same ownership flags as the CPU side buffers
  • It is not cleared (even not in the current IMB_dupImBuf, which is quite bad!)
    So it leads to possibility of the texture freed but still referenced from the source ImBuf.

Seeing this, and some other tricky things covered in the temporary buffer trick makes we wonder: shall we just take a more local and explicit route:

  • Add function to assign data+colorspace from a buffer
  • Take care of temporary image_buffer allocation in the CachedImage()
  • Assign only buffer we are interested in in that function.

Something like:

/**
 * Assign the content and the color space of the corresponding buffer the data from the given
 * buffer.
 *
 * \note Does not modify the topology (width, height, number of channels)
 * or the mipmaps in any way.
 *
 * \note The ownership of the data in the source buffer is ignored.
 */
void IMB_assign_byte_buffer(ImBuf *ibuf, const ImBufByteBuffer &buffer, ImBufOwnership ownership);
void IMB_assign_float_buffer(ImBuf *ibuf,
                             const ImBufFloatBuffer &buffer,
                             ImBufOwnership ownership);

...

void IMB_assign_byte_buffer(ImBuf *ibuf,
                            const ImBufByteBuffer &buffer,
                            const ImBufOwnership ownership)
{
  IMB_assign_byte_buffer(ibuf, buffer.data, ownership);
  ibuf->byte_buffer.colorspace = buffer.colorspace;
}

void IMB_assign_float_buffer(ImBuf *ibuf,
                             const ImBufFloatBuffer &buffer,
                             const ImBufOwnership ownership)
{
  IMB_assign_float_buffer(ibuf, buffer.data, ownership);
  ibuf->float_buffer.colorspace = buffer.colorspace;
}

...

  ImBuf *linear_image_buffer = IMB_allocImBuf(
      image_buffer->x, image_buffer->y, image_buffer->planes, 0);
  IMB_assign_byte_buffer(linear_image_buffer, image_buffer->byte_buffer, IB_DO_NOT_TAKE_OWNERSHIP);
  IMB_assign_float_buffer(
      linear_image_buffer, image_buffer->float_buffer, IB_DO_NOT_TAKE_OWNERSHIP);

It makes it more obvious of exact ownership of data in the linear_image_buffer, without any implicit short-cuts, makes it possible to not worry about some non-trivial/non-obvious API in the ImBuf, and adds potentially handy function which has no downsides.

How do you feel about this suggestion?

P.S. Sorry for taking you through the previous API suggestion detour. I didn't have time to more deeply check on the actual code, and from looking into the patch I did not realize there are all those tricky cases of ownership.

Thanks for the update, it really helped to better see what code paths are involved into the new functionality. Unfortunately, it also made it apparent some real issues around the `ImIBuf::gpu`: - It can currently does not support same ownership flags as the CPU side buffers - It is not cleared (even not in the current IMB_dupImBuf, which is quite bad!) So it leads to possibility of the texture freed but still referenced from the source ImBuf. Seeing this, and some other tricky things covered in the temporary buffer trick makes we wonder: shall we just take a more local and explicit route: - Add function to assign data+colorspace from a buffer - Take care of temporary `image_buffer` allocation in the `CachedImage()` - Assign only buffer we are interested in in that function. Something like: ``` /** * Assign the content and the color space of the corresponding buffer the data from the given * buffer. * * \note Does not modify the topology (width, height, number of channels) * or the mipmaps in any way. * * \note The ownership of the data in the source buffer is ignored. */ void IMB_assign_byte_buffer(ImBuf *ibuf, const ImBufByteBuffer &buffer, ImBufOwnership ownership); void IMB_assign_float_buffer(ImBuf *ibuf, const ImBufFloatBuffer &buffer, ImBufOwnership ownership); ... void IMB_assign_byte_buffer(ImBuf *ibuf, const ImBufByteBuffer &buffer, const ImBufOwnership ownership) { IMB_assign_byte_buffer(ibuf, buffer.data, ownership); ibuf->byte_buffer.colorspace = buffer.colorspace; } void IMB_assign_float_buffer(ImBuf *ibuf, const ImBufFloatBuffer &buffer, const ImBufOwnership ownership) { IMB_assign_float_buffer(ibuf, buffer.data, ownership); ibuf->float_buffer.colorspace = buffer.colorspace; } ... ImBuf *linear_image_buffer = IMB_allocImBuf( image_buffer->x, image_buffer->y, image_buffer->planes, 0); IMB_assign_byte_buffer(linear_image_buffer, image_buffer->byte_buffer, IB_DO_NOT_TAKE_OWNERSHIP); IMB_assign_float_buffer( linear_image_buffer, image_buffer->float_buffer, IB_DO_NOT_TAKE_OWNERSHIP); ``` It makes it more obvious of exact ownership of data in the `linear_image_buffer`, without any implicit short-cuts, makes it possible to not worry about some non-trivial/non-obvious API in the ImBuf, and adds potentially handy function which has no downsides. How do you feel about this suggestion? P.S. Sorry for taking you through the previous API suggestion detour. I didn't have time to more deeply check on the actual code, and from looking into the patch I did not realize there are all those tricky cases of ownership.
Author
Member

@Sergey the suggested approach seems better to me. However, we lose DDS image support in that case. But apparently we also lose it in the current approach. So I guess we should implement the suggestion and try to fix DDS after.

@Sergey the suggested approach seems better to me. However, we lose DDS image support in that case. But apparently we also lose it in the current approach. So I guess we should implement the suggestion and try to fix DDS after.
Omar Emara added 1 commit 2024-03-15 09:42:09 +01:00
Author
Member

@Sergey I added an ownership flag to the DDS data structure and implemented assignment procedures similar to float and byte buffers. Hopefully this should cover all cases.

@Sergey I added an ownership flag to the DDS data structure and implemented assignment procedures similar to float and byte buffers. Hopefully this should cover all cases.
Sergey Sharybin reviewed 2024-03-20 17:09:20 +01:00
Sergey Sharybin left a comment
Owner

Good catch about DDS. I think the API you did makes sense.

Good catch about DDS. I think the API you did makes sense.
@ -208,0 +166,4 @@
/* If no float buffer exists, assign it then compute a float buffer from it. This is the main
* call of this function. */
if (!linear_image_buffer->float_buffer.data) {

Should this become

if (image_buffer->ftype == IMB_FTYPE_DDS) {
  ...
} else if (!linear_image_buffer->float_buffer.data) {
  ...
}

Otherwise it is not very clear to me why do we need both float buffer and DDS.

Should this become ``` if (image_buffer->ftype == IMB_FTYPE_DDS) { ... } else if (!linear_image_buffer->float_buffer.data) { ... } ``` Otherwise it is not very clear to me why do we need both float buffer and DDS.
Author
Member

@Sergey That's because some code exists that can fallback to the float buffer if the DDS data wasn't compatible for some reason. I asked about that in chat last week and Aras confirmed it.

@Sergey That's because some code exists that can fallback to the float buffer if the DDS data wasn't compatible for some reason. I asked about that in chat last week and Aras confirmed it.

I see. But if we have have float buffer which we can just use, what is the point of using DDS? Is it handling mipmaps?
Also, is DDS guaranteed to be in scene linear space?

I see. But if we have have float buffer which we can just use, what is the point of using DDS? Is it handling mipmaps? Also, is DDS guaranteed to be in scene linear space?
Author
Member

@Sergey I just assumed it would be faster to load and consume less memory, beside potentially having mipmaps.

Generally, DDS can be in sRGB it seems, but that doesn't seem to be handled by Blender, since it seems to be only supported in a specific DDS type.

However, it seems compressed textures are not yet supported in Metal, so it looks like we will have to intentionally disable it for now at least.

@Sergey I just assumed it would be faster to load and consume less memory, beside potentially having mipmaps. Generally, DDS can be in sRGB it seems, but that doesn't seem to be handled by Blender, since it seems to be only supported in a specific DDS type. However, it seems compressed textures are not yet supported in Metal, so it looks like we will have to intentionally disable it for now at least.
Author
Member

@Sergey I added suitable conditions to keep DDS support when possible.

@Sergey I added suitable conditions to keep DDS support when possible.

Do we really need to check for Metal here?
We can always assign shared DDS data to an imbuf, and if a backend does not use it, it is the backend's choice. Maybe it'll get supported later, and then you wouldn't need to hunt for all cases where check is done in the parent call.

Do we really need to check for Metal here? We can always assign shared DDS data to an imbuf, and if a backend does not use it, it is the backend's choice. Maybe it'll get supported later, and then you wouldn't need to hunt for all cases where check is done in the parent call.
Author
Member

@Sergey The problem is that GPU texture creation is not a back-end specific thing. But I now realize this is not an issue in this patch, but a general issue in Blender. I talked to Clement and will submit a patch to fix this for the Metal back-end.

@Sergey The problem is that GPU texture creation is not a back-end specific thing. But I now realize this is not an issue in this patch, but a general issue in Blender. I talked to Clement and will submit a patch to fix this for the Metal back-end.
OmarEmaraDev marked this conversation as resolved
Omar Emara added 2 commits 2024-03-22 12:10:58 +01:00
Omar Emara added 1 commit 2024-03-22 15:45:30 +01:00
Sergey Sharybin approved these changes 2024-03-22 16:24:13 +01:00
Thomas Dinges merged commit 4971b144a4 into main 2024-03-25 14:10:05 +01:00
Thomas Dinges deleted branch unify-compositor-srgb-linear 2024-03-25 14:10:07 +01:00
Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#118624
No description provided.