blender/blender

Compositor: Unify sRGB to Linear between CPU and GPU #118624

Merged

Thomas Dinges merged 13 commits from OmarEmaraDev/blender:unify-compositor-srgb-linear into main

2024-03-25 14:10:05 +01:00

Omar Emara commented

2024-02-22 15:53:12 +01:00

Member

This patch unifies the sRGB to Linear color space conversion between the
CPU and GPU compositors. This is because CPU uses an optimized path that
produces values that are very slightly off. To fix this, for the GPU, we
do the conversion CPU side instead of doing it in a shader. Since images
are cached, the performance implications are not significant.

Another added benefit is that we no longer get differences due to the
order of alpha pre-multiplication and sRGB conversion, demonstrated in
#114305. And we no longer require any preprocessing of the images.

This patch adds some new utilities to the Image Buffer module to assign
float, byte, and compressed buffers along with their color spaces. It
also adds an ownership flag to compressed data. Those were added as a
way to facilitate the implementation.

This patch unifies the sRGB to Linear color space conversion between the CPU and GPU compositors. This is because CPU uses an optimized path that produces values that are very slightly off. To fix this, for the GPU, we do the conversion CPU side instead of doing it in a shader. Since images are cached, the performance implications are not significant. Another added benefit is that we no longer get differences due to the order of alpha pre-multiplication and sRGB conversion, demonstrated in #114305. And we no longer require any preprocessing of the images. This patch adds some new utilities to the Image Buffer module to assign float, byte, and compressed buffers along with their color spaces. It also adds an ownership flag to compressed data. Those were added as a way to facilitate the implementation.

Omar Emara added the

labels 2024-02-22 15:53:13 +01:00

Omar Emara added 1 commit 2024-02-22 15:53:18 +01:00

Compositor: Unify sRGB to Linear between CPU and GPU 3b32e96fa9

This patch unifies the sRGB to Linear color space conversion between the
CPU and GPU compositors. This is because CPU uses an optimized path that
produce values that are very slightly off. To fix this, for the GPU, we
do the conversion CPU side instead of doing it in a shader. Since images
are cached, the performance implications are not significant.

Another added benefit is that we no longer get differences due to the
order of alpha pre-multiplication and sRGB conversion.

Omar Emara requested review from Sergey Sharybin 2024-02-22 15:53:34 +01:00

Omar Emara changed title from ~~Compositor: Unify sRGB to Linear between CPU and GPU~~ to WIP: Compositor: Unify sRGB to Linear between CPU and GPU

2024-02-22 15:53:43 +01:00

Omar Emara added 1 commit 2024-02-23 10:35:58 +01:00

Remove alpha pre-multiplication 687946c8a9

Omar Emara changed title from ~~WIP: Compositor: Unify sRGB to Linear between CPU and GPU~~ to Compositor: Unify sRGB to Linear between CPU and GPU

2024-02-23 10:40:10 +01:00

Omar Emara added 1 commit 2024-02-23 20:28:25 +01:00

Merge branch 'main' into unify-compositor-srgb-linear 7f97cbe24d

Omar Emara referenced this pull request

2024-02-23 20:31:49 +01:00

Compositor: CPU vs GPU Differences #118548

Omar Emara referenced this pull request

2024-02-26 12:15:45 +01:00

WIP: IMB: Allow BW images in byte to float conversion #118622

Sergey Sharybin commented

2024-02-26 17:46:58 +01:00

Owner

I am not really sure this is the best approach, but also can't provide thoughts of what I believe is the way to go at this time.

There are couple of issues with this approach:

It increases RAM (and possibly VRAM as well?) usage for all byte images.
It does not address possible difference of results on different CPU architectures: the x86 and arm64 provide different accuracy of conversion (due to one being SIMD-optimizated with not-so-many Newton steps, and the other not being SIMD optimized at all).

Do we know whether such difference causes issues on real-life setups, or it is only affecting the regression tests which "look" at matte of nodes which are sensitive to exact precision?

I am not really sure this is the best approach, but also can't provide thoughts of what I believe is the way to go at this time. There are couple of issues with this approach: - It increases RAM (and possibly VRAM as well?) usage for all byte images. - It does not address possible difference of results on different CPU architectures: the x86 and arm64 provide different accuracy of conversion (due to one being SIMD-optimizated with not-so-many Newton steps, and the other not being SIMD optimized at all). Do we know whether such difference causes issues on real-life setups, or it is only affecting the regression tests which "look" at matte of nodes which are sensitive to exact precision?

Omar Emara commented

2024-02-26 18:01:28 +01:00

Author

Member

@OmarEmaraDev I am not sure about the RAM/VRAM point, because this is a cached resource where the extra RAM usage is temporary and immediately freed when computing the resource, and never consumed again until the cached is invalidated. As for VRAM, this actually consumes less VRAM, because byte sRGB textures had to be preprocessed anyways for alpha premultiplication and linear space conversion, which is eventually saved in a half float texture. While we skip the byte sRGB texture allocation in this pull request.

I have no idea what we need to do regarding the x86 vs ARM difference. But the CPU vs GPU is arguably more serious, since it is visible to the user in the same computer and session.

I am not sure what we would consider as a real-life difference. But the matte difference can be reproduced using a single Matte node as can be seen in the regression test files, which seems to be a real life example where that's precisely what the nodes are made for.
Moreover, any operation that uses thresholding, does operations in other color spaces, or evaluates the image at a curve map (Like RGB Curves, Hue Correct, or Color Ramp) would also produce a visible difference.

@OmarEmaraDev I am not sure about the RAM/VRAM point, because this is a cached resource where the extra RAM usage is temporary and immediately freed when computing the resource, and never consumed again until the cached is invalidated. As for VRAM, this actually consumes less VRAM, because byte sRGB textures had to be preprocessed anyways for alpha premultiplication and linear space conversion, which is eventually saved in a half float texture. While we skip the byte sRGB texture allocation in this pull request. I have no idea what we need to do regarding the x86 vs ARM difference. But the CPU vs GPU is arguably more serious, since it is visible to the user in the same computer and session. I am not sure what we would consider as a real-life difference. But the matte difference can be reproduced using a single Matte node as can be seen in the regression test files, which seems to be a real life example where that's precisely what the nodes are made for. Moreover, any operation that uses thresholding, does operations in other color spaces, or evaluates the image at a curve map (Like RGB Curves, Hue Correct, or Color Ramp) would also produce a visible difference.

Sergey Sharybin added this to the Compositing project 2024-02-27 10:13:25 +01:00

Sergey Sharybin requested changes 2024-03-05 16:27:44 +01:00

Dismissed

Sergey Sharybin left a comment

Owner

I had quite some thoughts about it. It would be nice if we could somehow easily and quickly stream textures in an out, which with this approach seems to bit more tricky due to the CPU side "overhead". But then on another hand, the simplified GPU transforms and lowered VRAM consumption is quite attractive. And finally, it is probably good idea to use half-float EXRs as inputs for proper compositing, and those use-cases are not affected by this change. So without having any better solution in mind, lets go with this one.

However, there is one thing which would be nice to resolve: don't modify the acquired imbuf in-place, but instead create a temporary one which references the data buffer from the original, and do conversion that way. This will avoid possible threading conflict with other threads accessing the same imbuf, and making some decisions about using float buffer, which might be removed by the compositor thread.

I had quite some thoughts about it. It would be nice if we could somehow easily and quickly stream textures in an out, which with this approach seems to bit more tricky due to the CPU side "overhead". But then on another hand, the simplified GPU transforms and lowered VRAM consumption is quite attractive. And finally, it is probably good idea to use half-float EXRs as inputs for proper compositing, and those use-cases are not affected by this change. So without having any better solution in mind, lets go with this one. However, there is one thing which would be nice to resolve: don't modify the acquired imbuf in-place, but instead create a temporary one which references the data buffer from the original, and do conversion that way. This will avoid possible threading conflict with other threads accessing the same imbuf, and making some decisions about using float buffer, which might be removed by the compositor thread.

Omar Emara commented

2024-03-08 08:52:57 +01:00

Author

Member

@Sergey But isn't the image buffer protected using the cache mutex? So it seems like conflicts are not really possible.

@Sergey But isn't the image buffer protected using the cache mutex? So it seems like conflicts are not really possible.

Sergey Sharybin commented

2024-03-08 10:53:14 +01:00

Owner

@OmarEmaraDev The image is only protected during BKE_image_acquire_ibuf() and during BKE_image_release_ibuf(), so that multiple threads can access image buffers of the same image. The image buffer itself is not protected, so if threads modify the image buffer, it needs to be done with care.

P.S. There is a special case of accessing viewer type of images, such as Render Result. Those do acquire global lock, but they need to receive an extra lock argument to the acquire/release functions.

@OmarEmaraDev The image is only protected during `BKE_image_acquire_ibuf()` and during `BKE_image_release_ibuf()`, so that multiple threads can access image buffers of the same image. The image buffer itself is not protected, so if threads modify the image buffer, it needs to be done with care. P.S. There is a special case of accessing viewer type of images, such as Render Result. Those do acquire global lock, but they need to receive an extra `lock` argument to the acquire/release functions.

Omar Emara added 3 commits 2024-03-08 14:34:38 +01:00

Merge branch 'main' into unify-compositor-srgb-linear 28fc151faa

Perform conversion in a shallow copy ca01c01c05

Document function dab1b7af02

Omar Emara added 1 commit 2024-03-08 14:36:14 +01:00

Correct function interface 406a6dd330

Sergey Sharybin reviewed 2024-03-13 15:58:46 +01:00

Sergey Sharybin left a comment

Owner

Some comment on the public API changes in the IMB.
The compositor side changes seems to be good.

Some comment on the public API changes in the IMB. The compositor side changes seems to be good.

source/blender/imbuf/IMB_imbuf.hh Outdated

						
				@ -117,6 +117,11 @@ ImBuf *IMB_allocFromBuffer(const uint8_t *byte_buffer,

				 */

				void IMB_assign_byte_buffer(ImBuf *ibuf, uint8_t *buffer_data, ImBufOwnership ownership);

				void IMB_assign_float_buffer(ImBuf *ibuf, float *buffer_data, ImBufOwnership ownership);

				void IMB_assign_encoded_buffer(ImBuf *ibuf,

Sergey Sharybin commented

2024-03-13 15:54:55 +01:00

Owner

I wouldn't make it public function, and instead have imb_assign_encoded_buffer next to the imb_addencodedbufferImBuf in the IMB_allocimbuf.h

I wouldn't make it public function, and instead have `imb_assign_encoded_buffer` next to the `imb_addencodedbufferImBuf` in the `IMB_allocimbuf.h`

OmarEmaraDev marked this conversation as resolved

source/blender/imbuf/IMB_imbuf.hh Outdated

						
				@ -146,1 +151,3 @@

				ImBuf *IMB_dupImBuf(const ImBuf *ibuf1);

				/* Creates a copy of the given buffer. If the shallow arguement is false, the internals buffers

				 * will be copied as well, if true, the buffers will be shared using IB_DO_NOT_TAKE_OWNERSHIP. */

				ImBuf *IMB_dupImBuf(const ImBuf *ibuf1, bool shallow = false);

Sergey Sharybin commented

2024-03-13 15:58:18 +01:00

Owner

I think it will be more clear to have a separate function IMB_copy_sharing than to have a boolean argument, as it will help local readability: ImBuf *my_ibuf = IMB_dupImBuf(ibuf, true) vs. ImBuf *my_ibuf = IMB_copy_sharing(ibuf, true).

Also, as you might have noticed, to me it feels better to avoid shallow, to avoid possible confusion with shallow DNA copy, which is quite a different thing.

I think it will be more clear to have a separate function `IMB_copy_sharing` than to have a boolean argument, as it will help local readability: `ImBuf *my_ibuf = IMB_dupImBuf(ibuf, true)` vs. `ImBuf *my_ibuf = IMB_copy_sharing(ibuf, true)`. Also, as you might have noticed, to me it feels better to avoid `shallow`, to avoid possible confusion with shallow DNA copy, which is quite a different thing.

OmarEmaraDev marked this conversation as resolved

Omar Emara added 2 commits 2024-03-14 10:42:40 +01:00

Merge branch 'main' into unify-compositor-srgb-linear c854e2ecf9

Address review

buildbot/vexp-code-patch-darwin-x86_64 Build done.

Details

buildbot/vexp-code-patch-linux-x86_64 Build done.

Details

buildbot/vexp-code-patch-lint Build done.

Details

buildbot/vexp-code-patch-windows-amd64 Build done.

Details

buildbot/vexp-code-patch-darwin-arm64 Build done.

Details

buildbot/vexp-code-patch-coordinator Build done.

Details

Omar Emara commented

2024-03-14 10:46:18 +01:00

Author

Member

@blender-bot build

@blender-bot build

Sergey Sharybin reviewed 2024-03-14 12:44:14 +01:00

Sergey Sharybin left a comment

Owner

Thanks for the update, it really helped to better see what code paths are involved into the new functionality. Unfortunately, it also made it apparent some real issues around the ImIBuf::gpu:

It can currently does not support same ownership flags as the CPU side buffers
It is not cleared (even not in the current IMB_dupImBuf, which is quite bad!)
So it leads to possibility of the texture freed but still referenced from the source ImBuf.

Seeing this, and some other tricky things covered in the temporary buffer trick makes we wonder: shall we just take a more local and explicit route:

Add function to assign data+colorspace from a buffer
Take care of temporary image_buffer allocation in the CachedImage()
Assign only buffer we are interested in in that function.

Something like:

/**
 * Assign the content and the color space of the corresponding buffer the data from the given
 * buffer.
 *
 * \note Does not modify the topology (width, height, number of channels)
 * or the mipmaps in any way.
 *
 * \note The ownership of the data in the source buffer is ignored.
 */
void IMB_assign_byte_buffer(ImBuf *ibuf, const ImBufByteBuffer &buffer, ImBufOwnership ownership);
void IMB_assign_float_buffer(ImBuf *ibuf,
                             const ImBufFloatBuffer &buffer,
                             ImBufOwnership ownership);

...

void IMB_assign_byte_buffer(ImBuf *ibuf,
                            const ImBufByteBuffer &buffer,
                            const ImBufOwnership ownership)
{
  IMB_assign_byte_buffer(ibuf, buffer.data, ownership);
  ibuf->byte_buffer.colorspace = buffer.colorspace;
}

void IMB_assign_float_buffer(ImBuf *ibuf,
                             const ImBufFloatBuffer &buffer,
                             const ImBufOwnership ownership)
{
  IMB_assign_float_buffer(ibuf, buffer.data, ownership);
  ibuf->float_buffer.colorspace = buffer.colorspace;
}

...

  ImBuf *linear_image_buffer = IMB_allocImBuf(
      image_buffer->x, image_buffer->y, image_buffer->planes, 0);
  IMB_assign_byte_buffer(linear_image_buffer, image_buffer->byte_buffer, IB_DO_NOT_TAKE_OWNERSHIP);
  IMB_assign_float_buffer(
      linear_image_buffer, image_buffer->float_buffer, IB_DO_NOT_TAKE_OWNERSHIP);

It makes it more obvious of exact ownership of data in the linear_image_buffer, without any implicit short-cuts, makes it possible to not worry about some non-trivial/non-obvious API in the ImBuf, and adds potentially handy function which has no downsides.

How do you feel about this suggestion?

P.S. Sorry for taking you through the previous API suggestion detour. I didn't have time to more deeply check on the actual code, and from looking into the patch I did not realize there are all those tricky cases of ownership.

Thanks for the update, it really helped to better see what code paths are involved into the new functionality. Unfortunately, it also made it apparent some real issues around the `ImIBuf::gpu`: - It can currently does not support same ownership flags as the CPU side buffers - It is not cleared (even not in the current IMB_dupImBuf, which is quite bad!) So it leads to possibility of the texture freed but still referenced from the source ImBuf. Seeing this, and some other tricky things covered in the temporary buffer trick makes we wonder: shall we just take a more local and explicit route: - Add function to assign data+colorspace from a buffer - Take care of temporary `image_buffer` allocation in the `CachedImage()` - Assign only buffer we are interested in in that function. Something like: ``` /** * Assign the content and the color space of the corresponding buffer the data from the given * buffer. * * \note Does not modify the topology (width, height, number of channels) * or the mipmaps in any way. * * \note The ownership of the data in the source buffer is ignored. */ void IMB_assign_byte_buffer(ImBuf *ibuf, const ImBufByteBuffer &buffer, ImBufOwnership ownership); void IMB_assign_float_buffer(ImBuf *ibuf, const ImBufFloatBuffer &buffer, ImBufOwnership ownership); ... void IMB_assign_byte_buffer(ImBuf *ibuf, const ImBufByteBuffer &buffer, const ImBufOwnership ownership) { IMB_assign_byte_buffer(ibuf, buffer.data, ownership); ibuf->byte_buffer.colorspace = buffer.colorspace; } void IMB_assign_float_buffer(ImBuf *ibuf, const ImBufFloatBuffer &buffer, const ImBufOwnership ownership) { IMB_assign_float_buffer(ibuf, buffer.data, ownership); ibuf->float_buffer.colorspace = buffer.colorspace; } ... ImBuf *linear_image_buffer = IMB_allocImBuf( image_buffer->x, image_buffer->y, image_buffer->planes, 0); IMB_assign_byte_buffer(linear_image_buffer, image_buffer->byte_buffer, IB_DO_NOT_TAKE_OWNERSHIP); IMB_assign_float_buffer( linear_image_buffer, image_buffer->float_buffer, IB_DO_NOT_TAKE_OWNERSHIP); ``` It makes it more obvious of exact ownership of data in the `linear_image_buffer`, without any implicit short-cuts, makes it possible to not worry about some non-trivial/non-obvious API in the ImBuf, and adds potentially handy function which has no downsides. How do you feel about this suggestion? P.S. Sorry for taking you through the previous API suggestion detour. I didn't have time to more deeply check on the actual code, and from looking into the patch I did not realize there are all those tricky cases of ownership.

Omar Emara commented

2024-03-15 07:30:42 +01:00

Author

Member

@Sergey the suggested approach seems better to me. However, we lose DDS image support in that case. But apparently we also lose it in the current approach. So I guess we should implement the suggestion and try to fix DDS after.

@Sergey the suggested approach seems better to me. However, we lose DDS image support in that case. But apparently we also lose it in the current approach. So I guess we should implement the suggestion and try to fix DDS after.

Omar Emara added 1 commit 2024-03-15 09:42:09 +01:00

Address comments 15e3771ee6

Omar Emara commented

2024-03-15 09:44:37 +01:00

Author

Member

@Sergey I added an ownership flag to the DDS data structure and implemented assignment procedures similar to float and byte buffers. Hopefully this should cover all cases.

@Sergey I added an ownership flag to the DDS data structure and implemented assignment procedures similar to float and byte buffers. Hopefully this should cover all cases.

Sergey Sharybin reviewed 2024-03-20 17:09:20 +01:00

Sergey Sharybin left a comment

Owner

Good catch about DDS. I think the API you did makes sense.

Good catch about DDS. I think the API you did makes sense.

source/blender/compositor/realtime_compositor/cached_resources/intern/cached_image.cc

						
				@ -208,0 +166,4 @@

				  /* If no float buffer exists, assign it then compute a float buffer from it. This is the main

				   * call of this function. */

				  if (!linear_image_buffer->float_buffer.data) {

Sergey Sharybin commented

2024-03-20 17:08:49 +01:00

Owner

Should this become

if (image_buffer->ftype == IMB_FTYPE_DDS) {
  ...
} else if (!linear_image_buffer->float_buffer.data) {
  ...
}

Otherwise it is not very clear to me why do we need both float buffer and DDS.

Should this become ``` if (image_buffer->ftype == IMB_FTYPE_DDS) { ... } else if (!linear_image_buffer->float_buffer.data) { ... } ``` Otherwise it is not very clear to me why do we need both float buffer and DDS.

Omar Emara commented

2024-03-20 18:38:49 +01:00

Author

Member

@Sergey That's because some code exists that can fallback to the float buffer if the DDS data wasn't compatible for some reason. I asked about that in chat last week and Aras confirmed it.

@Sergey That's because some code exists that can fallback to the float buffer if the DDS data wasn't compatible for some reason. I asked about that in chat last week and Aras confirmed it.

Sergey Sharybin commented

2024-03-22 10:44:37 +01:00

Owner

I see. But if we have have float buffer which we can just use, what is the point of using DDS? Is it handling mipmaps?
Also, is DDS guaranteed to be in scene linear space?

I see. But if we have have float buffer which we can just use, what is the point of using DDS? Is it handling mipmaps? Also, is DDS guaranteed to be in scene linear space?

Omar Emara commented

2024-03-22 11:05:46 +01:00

Author

Member

@Sergey I just assumed it would be faster to load and consume less memory, beside potentially having mipmaps.

Generally, DDS can be in sRGB it seems, but that doesn't seem to be handled by Blender, since it seems to be only supported in a specific DDS type.

However, it seems compressed textures are not yet supported in Metal, so it looks like we will have to intentionally disable it for now at least.

@Sergey I just assumed it would be faster to load and consume less memory, beside potentially having mipmaps. Generally, DDS can be in sRGB it seems, but that doesn't seem to be handled by Blender, since it seems to be only supported in a specific DDS type. However, it seems compressed textures are not yet supported in Metal, so it looks like we will have to intentionally disable it for now at least.

Omar Emara commented

2024-03-22 12:12:07 +01:00

Author

Member

@Sergey I added suitable conditions to keep DDS support when possible.

@Sergey I added suitable conditions to keep DDS support when possible.

Sergey Sharybin commented

2024-03-22 15:13:47 +01:00

Owner

Do we really need to check for Metal here?
We can always assign shared DDS data to an imbuf, and if a backend does not use it, it is the backend's choice. Maybe it'll get supported later, and then you wouldn't need to hunt for all cases where check is done in the parent call.

Do we really need to check for Metal here? We can always assign shared DDS data to an imbuf, and if a backend does not use it, it is the backend's choice. Maybe it'll get supported later, and then you wouldn't need to hunt for all cases where check is done in the parent call.

Omar Emara commented

2024-03-22 15:32:26 +01:00

Author

Member

@Sergey The problem is that GPU texture creation is not a back-end specific thing. But I now realize this is not an issue in this patch, but a general issue in Blender. I talked to Clement and will submit a patch to fix this for the Metal back-end.

@Sergey The problem is that GPU texture creation is not a back-end specific thing. But I now realize this is not an issue in this patch, but a general issue in Blender. I talked to Clement and will submit a patch to fix this for the Metal back-end.

OmarEmaraDev marked this conversation as resolved

Omar Emara referenced this pull request

2024-03-21 07:29:00 +01:00

WIP: IMB: Allow BW images in byte to float conversion #118622

Omar Emara added 2 commits 2024-03-22 12:10:58 +01:00

Merge branch 'main' into unify-compositor-srgb-linear ef29bdabab

Only add compressed buffer if suitable 4b021ab01e

Omar Emara added 1 commit 2024-03-22 15:45:30 +01:00

Revert Metal check 1f49db485c

Sergey Sharybin approved these changes 2024-03-22 16:24:13 +01:00

Thomas Dinges referenced this issue from a commit

2024-03-25 14:10:05 +01:00

Compositor: Unify sRGB to Linear between CPU and GPU

Thomas Dinges merged commit 4971b144a4 into main

2024-03-25 14:10:05 +01:00

Thomas Dinges deleted branch unify-compositor-srgb-linear

2024-03-25 14:10:07 +01:00

Sign in to join this conversation.

No reviewers

No Label

Animation & Rigging

Automated Testing

Blender Asset Bundle

Dependency Graph

Development Management

EEVEE & Viewport

Images & Movies

Motion Tracking

Nodes & Physics

Pipeline, Assets & IO

Platforms, Builds & Tests

Render & Cycles

Render Pipeline

Sculpt, Paint & Texture

Video Sequencer

Virtual Reality

Asset Browser Project

Blender 2.8 Project

Milestone 1: Basic, Local Asset Browser

Good First Issue

Animation & Rigging

Development Management

EEVEE & Viewport

Nodes & Physics

Pipeline, Assets & IO

Platforms, Builds & Tests

Render & Cycles

Sculpt, Paint & Texture

Needs Info from Developers

Needs Information from User

No Milestone

No project

No Assignees

2 Participants

Notifications

Due Date

The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#118624

No description provided.