Compositor: Unify sRGB to Linear between CPU and GPU #118624
No reviewers
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#118624
Loading…
Reference in New Issue
No description provided.
Delete Branch "OmarEmaraDev/blender:unify-compositor-srgb-linear"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This patch unifies the sRGB to Linear color space conversion between the
CPU and GPU compositors. This is because CPU uses an optimized path that
produces values that are very slightly off. To fix this, for the GPU, we
do the conversion CPU side instead of doing it in a shader. Since images
are cached, the performance implications are not significant.
Another added benefit is that we no longer get differences due to the
order of alpha pre-multiplication and sRGB conversion, demonstrated in
#114305. And we no longer require any preprocessing of the images.
This patch adds some new utilities to the Image Buffer module to assign
float, byte, and compressed buffers along with their color spaces. It
also adds an ownership flag to compressed data. Those were added as a
way to facilitate the implementation.
Compositor: Unify sRGB to Linear between CPU and GPUto WIP: Compositor: Unify sRGB to Linear between CPU and GPUWIP: Compositor: Unify sRGB to Linear between CPU and GPUto Compositor: Unify sRGB to Linear between CPU and GPUI am not really sure this is the best approach, but also can't provide thoughts of what I believe is the way to go at this time.
There are couple of issues with this approach:
Do we know whether such difference causes issues on real-life setups, or it is only affecting the regression tests which "look" at matte of nodes which are sensitive to exact precision?
@OmarEmaraDev I am not sure about the RAM/VRAM point, because this is a cached resource where the extra RAM usage is temporary and immediately freed when computing the resource, and never consumed again until the cached is invalidated. As for VRAM, this actually consumes less VRAM, because byte sRGB textures had to be preprocessed anyways for alpha premultiplication and linear space conversion, which is eventually saved in a half float texture. While we skip the byte sRGB texture allocation in this pull request.
I have no idea what we need to do regarding the x86 vs ARM difference. But the CPU vs GPU is arguably more serious, since it is visible to the user in the same computer and session.
I am not sure what we would consider as a real-life difference. But the matte difference can be reproduced using a single Matte node as can be seen in the regression test files, which seems to be a real life example where that's precisely what the nodes are made for.
Moreover, any operation that uses thresholding, does operations in other color spaces, or evaluates the image at a curve map (Like RGB Curves, Hue Correct, or Color Ramp) would also produce a visible difference.
I had quite some thoughts about it. It would be nice if we could somehow easily and quickly stream textures in an out, which with this approach seems to bit more tricky due to the CPU side "overhead". But then on another hand, the simplified GPU transforms and lowered VRAM consumption is quite attractive. And finally, it is probably good idea to use half-float EXRs as inputs for proper compositing, and those use-cases are not affected by this change. So without having any better solution in mind, lets go with this one.
However, there is one thing which would be nice to resolve: don't modify the acquired imbuf in-place, but instead create a temporary one which references the data buffer from the original, and do conversion that way. This will avoid possible threading conflict with other threads accessing the same imbuf, and making some decisions about using float buffer, which might be removed by the compositor thread.
@Sergey But isn't the image buffer protected using the cache mutex? So it seems like conflicts are not really possible.
@OmarEmaraDev The image is only protected during
BKE_image_acquire_ibuf()
and duringBKE_image_release_ibuf()
, so that multiple threads can access image buffers of the same image. The image buffer itself is not protected, so if threads modify the image buffer, it needs to be done with care.P.S. There is a special case of accessing viewer type of images, such as Render Result. Those do acquire global lock, but they need to receive an extra
lock
argument to the acquire/release functions.Some comment on the public API changes in the IMB.
The compositor side changes seems to be good.
@ -117,6 +117,11 @@ ImBuf *IMB_allocFromBuffer(const uint8_t *byte_buffer,
*/
void IMB_assign_byte_buffer(ImBuf *ibuf, uint8_t *buffer_data, ImBufOwnership ownership);
void IMB_assign_float_buffer(ImBuf *ibuf, float *buffer_data, ImBufOwnership ownership);
void IMB_assign_encoded_buffer(ImBuf *ibuf,
I wouldn't make it public function, and instead have
imb_assign_encoded_buffer
next to theimb_addencodedbufferImBuf
in theIMB_allocimbuf.h
@ -146,1 +151,3 @@
ImBuf *IMB_dupImBuf(const ImBuf *ibuf1);
/* Creates a copy of the given buffer. If the shallow arguement is false, the internals buffers
* will be copied as well, if true, the buffers will be shared using IB_DO_NOT_TAKE_OWNERSHIP. */
ImBuf *IMB_dupImBuf(const ImBuf *ibuf1, bool shallow = false);
I think it will be more clear to have a separate function
IMB_copy_sharing
than to have a boolean argument, as it will help local readability:ImBuf *my_ibuf = IMB_dupImBuf(ibuf, true)
vs.ImBuf *my_ibuf = IMB_copy_sharing(ibuf, true)
.Also, as you might have noticed, to me it feels better to avoid
shallow
, to avoid possible confusion with shallow DNA copy, which is quite a different thing.@blender-bot build
Thanks for the update, it really helped to better see what code paths are involved into the new functionality. Unfortunately, it also made it apparent some real issues around the
ImIBuf::gpu
:So it leads to possibility of the texture freed but still referenced from the source ImBuf.
Seeing this, and some other tricky things covered in the temporary buffer trick makes we wonder: shall we just take a more local and explicit route:
image_buffer
allocation in theCachedImage()
Something like:
It makes it more obvious of exact ownership of data in the
linear_image_buffer
, without any implicit short-cuts, makes it possible to not worry about some non-trivial/non-obvious API in the ImBuf, and adds potentially handy function which has no downsides.How do you feel about this suggestion?
P.S. Sorry for taking you through the previous API suggestion detour. I didn't have time to more deeply check on the actual code, and from looking into the patch I did not realize there are all those tricky cases of ownership.
@Sergey the suggested approach seems better to me. However, we lose DDS image support in that case. But apparently we also lose it in the current approach. So I guess we should implement the suggestion and try to fix DDS after.
@Sergey I added an ownership flag to the DDS data structure and implemented assignment procedures similar to float and byte buffers. Hopefully this should cover all cases.
Good catch about DDS. I think the API you did makes sense.
@ -208,0 +166,4 @@
/* If no float buffer exists, assign it then compute a float buffer from it. This is the main
* call of this function. */
if (!linear_image_buffer->float_buffer.data) {
Should this become
Otherwise it is not very clear to me why do we need both float buffer and DDS.
@Sergey That's because some code exists that can fallback to the float buffer if the DDS data wasn't compatible for some reason. I asked about that in chat last week and Aras confirmed it.
I see. But if we have have float buffer which we can just use, what is the point of using DDS? Is it handling mipmaps?
Also, is DDS guaranteed to be in scene linear space?
@Sergey I just assumed it would be faster to load and consume less memory, beside potentially having mipmaps.
Generally, DDS can be in sRGB it seems, but that doesn't seem to be handled by Blender, since it seems to be only supported in a specific DDS type.
However, it seems compressed textures are not yet supported in Metal, so it looks like we will have to intentionally disable it for now at least.
@Sergey I added suitable conditions to keep DDS support when possible.
Do we really need to check for Metal here?
We can always assign shared DDS data to an imbuf, and if a backend does not use it, it is the backend's choice. Maybe it'll get supported later, and then you wouldn't need to hunt for all cases where check is done in the parent call.
@Sergey The problem is that GPU texture creation is not a back-end specific thing. But I now realize this is not an issue in this patch, but a general issue in Blender. I talked to Clement and will submit a patch to fix this for the Metal back-end.