Cycles: Add Intel GPU support for OpenImageDenoise #108314
Open
Stefan Werner
wants to merge 18 commits from
When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.
Stefan_Werner/blender:oidn2
into main
pull from: Stefan_Werner/blender:oidn2
merge into: blender:main
blender:main
blender:blender-v4.0-release
blender:temp-sculpt-dyntopo
blender:blender-v3.6-release
blender:universal-scene-description
blender:blender-v3.3-release
blender:asset-browser-frontend-split
blender:brush-assets-project
blender:asset-shelf
blender:anim/armature-drawing-refactor-3
blender:temp-sculpt-dyntopo-hive-alloc
blender:tmp-usd-python-mtl
blender:tmp-usd-3.6
blender:blender-v3.5-release
blender:blender-projects-basics
blender:blender-v2.93-release
blender:temp-sculpt-attr-api
blender:realtime-clock
blender:sculpt-dev
blender:gpencil-next
blender:bevelv2
blender:microfacet_hair
blender:xr-dev
blender:principled-v2
When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.
Reviewers
Request review
No reviewers
Labels
Clear labels
This issue affects/is about backward or forward compatibility
Issues relating to security: https://wiki.blender.org/wiki/Process/Vulnerability_Reports
Apply labels
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
This issue affects/is about backward or forward compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest: Wayland
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Issues relating to security: https://wiki.blender.org/wiki/Process/Vulnerability_Reports
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest: Wayland
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
Milestone
Set milestone
Clear milestone
No items
No Milestone
Projects
Set Project
Clear projects
No project
Assignees
Assign users
Clear assignees
No Assignees
7 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.
No due date set.
Depends on
#112143 Build: Update OpenImageDenoise to 2.0.1
blender/blender
Reference: blender/blender#108314
Reference in New Issue
There is no content yet.
Delete Branch "Stefan_Werner/blender:oidn2"
Deleting a branch is permanent. Although the deleted branch may exist for a short time before cleaning up, in most cases it CANNOT be undone. Continue?
OpenImageDenoise V2 comes with GPU support for various backends. This adds a new class, OIDN2Denoiser, in order to add this functionality into the existing Cycles post processing pipeline without having to change it much. OptiX and OIDN CPU denoising remain as they are, there is now a new user option to select OIDN GPU denoising when a compatible device is detected.
Device support is initially limited to the oneAPI devices that are supported by Cycles already, but can be extended.
on the deps, something similar as in
66b4e426cc
needs to be done, esp using ninja on Windows (CMAKE_GENERATOR ${PLATFORM_ALT_GENERATOR}
)https://github.com/OpenImageDenoise/oidn/blob/master/doc/compilation.md also recommends using ocloc 101.4314 but it has known issues with Blender, updating ocloc to a newer version such as 101.4369 will be needed.
I'll figure it out, not overly worried, I'll bring up the Windows/CPU support first, and we'll branch out from there.
Just to clarify: i'm working the deps builder part, best not to duplicate this work
0465255054
toecba53c269
ecba53c269
to3d72ee1be4
What is currently holding up this PR?
the WIP label
WIP: Update to OpenImageDenoise 2.0to Update to OpenImageDenoise 2.0Please follow the Commit Message Guidelines and present the PR accordingly to what it is.
If it is just an update of a library without user-level changes then I am not really sure about deeper changes in the
OIDNDenoiser
andDenoiserGPU
.If this change actually brings GPU support for the OIDN it should be presented as such, and mentioned what are supported GPUs are, what are the limitations (user-level and technical), some major architectural considerations, etc.
Initial pass of review.
@ -139,2 +143,4 @@
info.has_nanovdb = true;
info.denoisers = 0;
# if defined(WITH_OPENIMAGEDENOISE) && defined(OIDN_DEVICE_CUDA)
if (major >= 7) {
There is discrepancy here:
OIDNDenoiser ::is_device_supported(info)
which is goo, because it then can be seen as a source of truth, but here some specialized check is done here.OIDNDenoiser ::is_device_supported
(which doesn't do compute capability) will report the OIDN is supported on the device.Architecturally there needs to be a single source of truth.
If it is for some reason not possible it needs to be a clear documentation in the API.
This code has been backed out with the removal of CUDA support, will be looked revisited when CUDA support is reintroduced.
@ -57,2 +57,4 @@
bool denoise_filter_guiding_set_fake_albedo(const DenoiseContext &context);
/* Read guiding passes from the render buffers, preprocess them in a way which is expected by
* OptiX and store in the guiding passes memory within the given context.
Comments needs to be updated: the function call is no longer OptiX specific.
@ -577,4 +637,3 @@
<< "OpenImageDenoiser is not supported on this platform or build.";
#ifdef WITH_OPENIMAGEDENOISE
thread_scoped_lock lock(mutex_);
Any specific reason of removing this?
Some information got lost during Cycles X rewrite, the original comment stated:
We don't have tiles anymore, but we still don't want to run multiple OIDN denoisers at the same time.
If you looked deep into it, and the change is safe it is much better to isolate it to separate PR, explaining why this code is no longer relevant.
Mutex is back. Should it get removed again, it will be a separate commit and not part of this PR.
@ -573,12 +628,15 @@ bool OIDNDenoiser::denoise_buffer(const BufferParams &buffer_params,
const int num_samples,
bool allow_inplace_modification)
{
if (denoiser_device_->info.type != DEVICE_CPU && !cpu_fallback_) {
Is it possible to unify the code path for CPU and GPU OIDN denoising?
It is quite weird to maintain two separate denoising codepaths in the same denoiser implementation. It is almost like better to split them into
OIDNDenoiserCPU
andOIDNDenoiserGPU
.Unifying code paths would be easy from the OIDN side of things - there the API is pretty much identical, regardless of what device is being used.
The hard part is the Cycles side, where the GPU denoising path relies on the presence of a DeviceQueue, GPU memory and several kernels (e.g. DEVICE_KERNEL_FILTER_GUIDING_PREPROCESS), which the CPU device does not have. I suppose I could add a minimal CPUQueue implementation that supports only the kernels needed for denoising, if that is desired?
@ -621,9 +679,45 @@ bool OIDNDenoiser::denoise_buffer(const BufferParams &buffer_params,
return true;
}
bool OIDNDenoiser::denoise_buffer(const DenoiseTask &task)
I think the only difference from the
OptiXDenoiser::denoise_buffer
is that the OptiX denoiser activates the CUDA context.It seems quite easy to abstract the GPU context activation, and avoid logic duplication here.
#112229
@ -641,2 +737,3 @@
return Denoiser::ensure_denoiser_device(progress);
# else
Device *denoiser_device = Denoiser::ensure_denoiser_device(progress);
Why do we skip parenting level and go straight to
Denoiser::ensure_denoiser_device
and not toGPUDenoiser::ensure_denoiser_device
This section of code has been removed.
@ -643,0 +749,4 @@
#endif
}
bool OIDNDenoiser::denoise_ensure(DenoiseContext &context)
This seems to be the same as
OptiXDenoiser::denoise_ensure
so can be moved toGPUDenoiser
as well?Sure, will do.
#112229
@ -643,0 +799,4 @@
#if defined(OIDN_DEVICE_CUDA) && defined(WITH_CUDA)
case DEVICE_CUDA:
case DEVICE_OPTIX: {
int device_id = static_cast<CUDADevice *>(denoiser_queue_->device)->cuDevId;
I think it will be more clear if the CUDA device information was stored in the
DeviceInfo
. We already havecpu_threads
, so I do not't think it is unreasonable to add extra details likecuda_device_id
,cuda_compute_major
. Can hide those as private, only accessible via getter with an assert of the device type.We do have such device casts in some places already, but I don't think it is a sign of a good architecture.
Will be addressed when reintroducing CUDA support.
@ -643,0 +801,4 @@
case DEVICE_OPTIX: {
int device_id = static_cast<CUDADevice *>(denoiser_queue_->device)->cuDevId;
CUstream stream = nullptr;
oidn_device_ = oidnNewCUDADevice(&device_id, &stream, 1);
The data and kernel communication needs to happen within
denoiser_queue_
(from which you are able to access stream).Otherwise synchronizing data and kernel becomes very tricky without adding global synchronization (which I don't remember is possible when mixing default and explicit streams).
Thanks for pointing this out. I will restrict the initial patch to SYCL GPU devices for simplicity and put support for other devices in followup PRs.
@ -643,0 +807,4 @@
# endif
/* Devices without explicit support fall through to CPU backend. */
default:
oidn_device_ = oidnNewDevice(OIDN_DEVICE_TYPE_CPU);
Add a note about multi-device.
It is important that on multi-GPU render denoising happens on GPU, but locally from this code it is not immediately clear happens in such configuration.
This part of the code has been removed, the stricter separation between code paths now makes it clear that this runs on GPUs only.
@ -643,0 +832,4 @@
oidnSetFilterInt(oidn_filter_, "cleanAux", true);
}
albedo_filter_ = oidnNewFilter(oidn_device_, "RT");
These either needs to be allocated only if corresponding guiding pass is enabled, or explained that creation is cheap, as well as it does not cause 2 extra gigabytes of memory used.
Done.
@Sergey Thanks for the quick review. I think it'll be best if I make the first pass to support SYCL devices only, and then add CUDA device support, if possible, in a second PR.
git refuses to push to this diff, keeps insisting it has commits I don't have locally even after a fresh fetch.. .soo ... that's fun...anyhow here's the changes needed for windows please commit them your selfedit: poor timing, we were working at the same time, all good now.
We're not aot'ing for the intel GPU's? when I previously scripted this I also had a
OpenImageDenoise_device_sycl_xehpg.dll
andOpenImageDenoise_device_sycl_xelp.dll
?Please update the PR description following Commit Message Guidelines. It helps both reviewers to know what are they looking at instead of figuring out what parts of the code are supposed to be related to.
Also, is it possible to split the library update to a separate PR which will not functional changes yet? This will help the review process and followup maintenance a lot.
Something seems have gone wrong with the latest update. It includes 400+ changes, which made the PR to have 1.5k files changed.
Not again. I don't know why or how git does that, I've had this before.
dd23efc25b
tofb67b2dabf
Should be fixed now. My apologies for the force push, I don't know of a better way to do this.
Update to OpenImageDenoise 2.0to Cycles: Add GPU support for OpenImageDenoisefb67b2dabf
to06c14abef8
06c14abef8
toc0fdf33c54
Cycles: Add GPU support for OpenImageDenoiseto Cycles: Add Intel GPU support for OpenImageDenoiseThis PR still contains the changes from #112143. Once that has been merged, this one will be smaller.
This branch can now build against OIDN V1 and V2. The changes for build/deps are now entirely contained in #112143 and no longer included in this patch.
To test the new functionality in this branch, #112143 needs to be merged or applied locally first.
@ -30,6 +32,7 @@ const NodeEnum *DenoiseParams::get_type_enum()
if (type_enum.empty()) {
type_enum.insert("optix", DENOISER_OPTIX);
type_enum.insert("openimageio", DENOISER_OPENIMAGEDENOISE);
This has mistakenly been named "openimageio". We should fix this, but not a part of this PR, I think.
Is good to see an OIDN denoiser which follows the typical flow of the GPU denoiser so closely! Makes it clear to see the difference.
The small comment I have is the naming. Can't say the
OIDN2Denoiser
is the clearest: the existing/CPU denoiser uses the same library. This is why I was proposing to useOIDNDenoiserGPU
to kind of put it into a scope of flow-based naming. As in, hey, this denoiser implements the GPU denoiser flow. Maybe there are better names. Or, maybe, if we can unify the CPU and GPU sides it will simply aOIDNDenoiser
, as it is now.On a user level I don't think we should have separate options for CPU and GPU OIDN: "just" choose the best fitting device. Similar to the OptiX denoiser which uses the same device for denoising as for path tracing whenever possible. We can even decide to choose available GPU device for denoising even when path tracing happens on a CPU. If a more granular control is needed, it is better to be implemented as an explicit device selection for denoising, which then can be used for both OIDN and OptiX. But such option is not something I'd start with.
When I was mentioning separation of CPU and GPU OIDN denoiser I mainly meant the code-side separation, as it is quite tricky to follow code which has a lot of top-level if-else statements. The
Denoiser::create
can choose which implementation to choose based on the device selection.One thing I am did not have time to fully understand yet is the peak memory comparison. It seems that now when pre-filter is used memory for albedo and normal is always allocated, while previously it was allowed to modify data in-place. For example, when rendering happens on multiple devices and denoiser is used a temporary RenderBuffers is created with the result gathered from all the devices, denoised, and copied back to individual devices. It is only the denoised passes which are copied, which makes it possible to modify guiding passes in-place, without extra full-frame buffer allocated. This is something that should be supported by the new denoiser implementation, to keep memory usage under control as much as possible.
The CPU/GPU unification is very interesting story! Sounds like there are ways to avoid separate CPU and GPU implementations for OIDN, so that would need to happen. Thing I am not sure is whether it's better to do such unification from the get-go, or allow some time of both implementations to co-exist for the cross-referencing purposes. Think I feel that allowing some time to co-exist is a better and more achievable step. But I am open to be convinced otherwise :)
For the CPU queue I am not really sure is the best approach. If the queue is only needed to schedule kernels we can take care of that by abstracting the call/synchronization. We can make functions like
denoise_filter_color_preprocess
virtual and let the OIDN denoiser implementation to take care of whether using queue or a direct kernel call. Somehow it seems like a better idea than opening a topic of introduction of CPU queue.I think some user control over using CPU or GPU denoising would be helpful.
The safe would be to use GPU denoising only for GPU rendering by default, maybe also for CPU viewport rendering it's fine. But final and especially background CPU renders should probably not use the GPU by default, and be something that a user has to enable. Because you wouldn't want to automatically use whatever GPU may exist on a render farm node, when it may be a simple integrated or low memory CPU not meant for that purpose.
Perhaps this could be configured in the preferences rather than the scene?
Even better would be a way to configure in the preferences which GPU (if any) to use for denoising.
That is all valid points, Brecht. I do agree such control is needed. Is just to me implementing it as a user preference or as a denoiser option in scene seems more flexible, more future proof, and reusable for OptiX than implementing such functionality via denoiser type.
I am not sure how OIDN implements device selection. Implementing the device used for OptiX denoiser seems straightforward: add a code to the
find_best_denoiser_device_info
which will returnDeviceInfo
for the selected in user preferences device.One thing which would be nice to agree on is the breakdown in terms what gets implemented when.
The way i see it is:
How does it sound to you?
Sounds fine, it's good to keep this PR simple and then add CPU render + GPU denoising support as a second step with a proper UI.
All currently supported (and planned) oneAPI GPU that can run Cycles can also run OIDN2. However, a user may want to render on the GPU and denoise on the CPU if memory is tight - I've seen a few scenes where that made a difference between rendering and out of memory errors. Likewise, someone with a scene too big for the GPU may still want to use GPU viewport denoising for performance reasons.
As a side note, denoising on a oneAPI GPU while rendering on an Nvidia GPU worked for me too. Since the reverse scenario, render on oneAPI and denoise in OptiX, was already working, this was not unexpected, still something I was happy to see.
@Sergey I'll take a look at memory consumption, if there are any excess allocations that can be skipped.
If someone has multiple GPUs, especially if they are not equal in performance it does not feel enough to have just CPU and GPU separation, and a more granular control seems to be a good idea. Having it could also benefit OptiX denoiser.
Is it OK if I make a suggestion as a user?
To keep it as simple as possible, In the preferences have checkboxes for:
if CPU and GPU are ticked in the denoising devices list, always favour GPU unless multiple compositor denoise nodes, in which case assign one GPU to each node, and one node to the CPU. When no nodes remaining, have the GPU restart the CPU's node if time remaining on it > GPU's average denoise time per completed node.
Beyond that I think everything should be automatic based on a quick calculation prior to execution to ensure only eligible GPU's are used based on them having sufficient VRAM to complete the task (for example if VRAM can't be emptied before denoising/moving onto the next frame due to persistent data being enabled (unless it could be briefly moved to system ram during denoising?))
If no eligible GPUs are found, then revert to CPU (even if not chosen as a denoise device)
I think those options and logic will keep it as simple as possible for the user, whilst ensuring best possible performance, without the user having to mess about trying to calculate if their GPU has sufficient resources for the resolution they wish to denoise (which is tricky because the user often doesn't have a clue how much VRAM is in use/available when cycles finishes sampling and denoising/compositor execution begins).
Reviewers