Cycles: Add Intel GPU support for OpenImageDenoise #108314

Merged
Stefan Werner merged 28 commits from Stefan_Werner/blender:oidn2 into main 2023-11-20 11:12:51 +01:00
Member

OpenImageDenoise V2 comes with GPU support for various backends. This adds a new class, OIDN2Denoiser, in order to add this functionality into the existing Cycles post processing pipeline without having to change it much. OptiX and OIDN CPU denoising remain as they are, there is now a new user option to select OIDN GPU denoising when a compatible device is detected.

Device support is initially limited to the oneAPI devices that are supported by Cycles already, but can be extended.

Ref #115045

OpenImageDenoise V2 comes with GPU support for various backends. This adds a new class, OIDN2Denoiser, in order to add this functionality into the existing Cycles post processing pipeline without having to change it much. OptiX and OIDN CPU denoising remain as they are, there is now a new user option to select OIDN GPU denoising when a compatible device is detected. Device support is initially limited to the oneAPI devices that are supported by Cycles already, but can be extended. Ref #115045
Xavier Hallade added this to the 4.0 milestone 2023-05-30 19:03:37 +02:00
Xavier Hallade added this to the Render & Cycles project 2023-05-30 19:03:42 +02:00
Xavier Hallade added the
Module
Render & Cycles
label 2023-05-30 19:03:59 +02:00
Member

on the deps, something similar as in 66b4e426cc needs to be done, esp using ninja on Windows (CMAKE_GENERATOR ${PLATFORM_ALT_GENERATOR})
https://github.com/OpenImageDenoise/oidn/blob/master/doc/compilation.md also recommends using ocloc 101.4314 but it has known issues with Blender, updating ocloc to a newer version such as 101.4369 will be needed.

on the deps, something similar as in https://projects.blender.org/blender/blender/commit/66b4e426cc3d8a12adca585bea217f2013568393 needs to be done, esp using ninja on Windows (`CMAKE_GENERATOR ${PLATFORM_ALT_GENERATOR}`) https://github.com/OpenImageDenoise/oidn/blob/master/doc/compilation.md also recommends using ocloc 101.4314 but it has known issues with Blender, updating ocloc to a newer version such as [101.4369](https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html) will be needed.
Member

I'll figure it out, not overly worried, I'll bring up the Windows/CPU support first, and we'll branch out from there.

Just to clarify: i'm working the deps builder part, best not to duplicate this work

I'll figure it out, not overly worried, I'll bring up the Windows/CPU support first, and we'll branch out from there. Just to clarify: i'm working the deps builder part, best not to duplicate this work
Stefan Werner force-pushed oidn2 from 0465255054 to ecba53c269 2023-07-25 16:00:12 +02:00 Compare
Stefan Werner force-pushed oidn2 from ecba53c269 to 3d72ee1be4 2023-07-25 16:05:29 +02:00 Compare
Contributor

What is currently holding up this PR?

What is currently holding up this PR?
Member

the WIP label

the WIP label
Stefan Werner changed title from WIP: Update to OpenImageDenoise 2.0 to Update to OpenImageDenoise 2.0 2023-08-22 14:28:04 +02:00
Stefan Werner requested review from Brecht Van Lommel 2023-08-22 15:53:33 +02:00
Stefan Werner requested review from Sergey Sharybin 2023-08-22 15:53:33 +02:00

Please follow the Commit Message Guidelines and present the PR accordingly to what it is.

If it is just an update of a library without user-level changes then I am not really sure about deeper changes in the OIDNDenoiser and DenoiserGPU .

If this change actually brings GPU support for the OIDN it should be presented as such, and mentioned what are supported GPUs are, what are the limitations (user-level and technical), some major architectural considerations, etc.

Please follow the [Commit Message Guidelines](https://wiki.blender.org/wiki/Style_Guide/Commit_Messages) and present the PR accordingly to what it is. If it is just an update of a library without user-level changes then I am not really sure about deeper changes in the `OIDNDenoiser` and `DenoiserGPU `. If this change actually brings GPU support for the OIDN it should be presented as such, and mentioned what are supported GPUs are, what are the limitations (user-level and technical), some major architectural considerations, etc.
Sergey Sharybin requested changes 2023-08-22 17:09:58 +02:00
Sergey Sharybin left a comment
Owner

Initial pass of review.

Initial pass of review.
@ -139,2 +143,4 @@
info.has_nanovdb = true;
info.denoisers = 0;
# if defined(WITH_OPENIMAGEDENOISE) && defined(OIDN_DEVICE_CUDA)
if (major >= 7) {

There is discrepancy here:

  • All the rest of devices use OIDNDenoiser ::is_device_supported(info) which is goo, because it then can be seen as a source of truth, but here some specialized check is done here.
  • When using, say, sm_60 the code here will report that OIDN is not supported on the device, but the OIDNDenoiser ::is_device_supported (which doesn't do compute capability) will report the OIDN is supported on the device.

Architecturally there needs to be a single source of truth.
If it is for some reason not possible it needs to be a clear documentation in the API.

There is discrepancy here: - All the rest of devices use `OIDNDenoiser ::is_device_supported(info)` which is goo, because it then can be seen as a source of truth, but here some specialized check is done here. - When using, say, sm_60 the code here will report that OIDN is not supported on the device, but the `OIDNDenoiser ::is_device_supported` (which doesn't do compute capability) will report the OIDN is supported on the device. Architecturally there needs to be a single source of truth. If it is for some reason not possible it needs to be a clear documentation in the API.
Author
Member

This code has been backed out with the removal of CUDA support, will be looked revisited when CUDA support is reintroduced.

This code has been backed out with the removal of CUDA support, will be looked revisited when CUDA support is reintroduced.
Stefan_Werner marked this conversation as resolved
@ -57,2 +57,4 @@
bool denoise_filter_guiding_set_fake_albedo(const DenoiseContext &context);
/* Read guiding passes from the render buffers, preprocess them in a way which is expected by
* OptiX and store in the guiding passes memory within the given context.

Comments needs to be updated: the function call is no longer OptiX specific.

Comments needs to be updated: the function call is no longer OptiX specific.
Stefan_Werner marked this conversation as resolved
@ -577,4 +637,3 @@
<< "OpenImageDenoiser is not supported on this platform or build.";
#ifdef WITH_OPENIMAGEDENOISE
thread_scoped_lock lock(mutex_);

Any specific reason of removing this?

Some information got lost during Cycles X rewrite, the original comment stated:

    /* Only one at a time, since OpenImageDenoise itself is multithreaded for full
     * buffers, and for tiled rendering because creating multiple devices and filters
     * is slow and memory hungry as well.
     *
     * TODO: optimize tiled rendering case, by batching together denoising of many
     * tiles somehow? */

We don't have tiles anymore, but we still don't want to run multiple OIDN denoisers at the same time.

If you looked deep into it, and the change is safe it is much better to isolate it to separate PR, explaining why this code is no longer relevant.

Any specific reason of removing this? Some information got lost during Cycles X rewrite, the original comment stated: ``` /* Only one at a time, since OpenImageDenoise itself is multithreaded for full * buffers, and for tiled rendering because creating multiple devices and filters * is slow and memory hungry as well. * * TODO: optimize tiled rendering case, by batching together denoising of many * tiles somehow? */ ``` We don't have tiles anymore, but we still don't want to run multiple OIDN denoisers at the same time. If you looked deep into it, and the change is safe it is much better to isolate it to separate PR, explaining why this code is no longer relevant.
Author
Member

Mutex is back. Should it get removed again, it will be a separate commit and not part of this PR.

Mutex is back. Should it get removed again, it will be a separate commit and not part of this PR.
Stefan_Werner marked this conversation as resolved
@ -573,12 +628,15 @@ bool OIDNDenoiser::denoise_buffer(const BufferParams &buffer_params,
const int num_samples,
bool allow_inplace_modification)
{
if (denoiser_device_->info.type != DEVICE_CPU && !cpu_fallback_) {

Is it possible to unify the code path for CPU and GPU OIDN denoising?

It is quite weird to maintain two separate denoising codepaths in the same denoiser implementation. It is almost like better to split them into OIDNDenoiserCPU and OIDNDenoiserGPU.

Is it possible to unify the code path for CPU and GPU OIDN denoising? It is quite weird to maintain two separate denoising codepaths in the same denoiser implementation. It is almost like better to split them into `OIDNDenoiserCPU` and `OIDNDenoiserGPU`.
Author
Member

Unifying code paths would be easy from the OIDN side of things - there the API is pretty much identical, regardless of what device is being used.

The hard part is the Cycles side, where the GPU denoising path relies on the presence of a DeviceQueue, GPU memory and several kernels (e.g. DEVICE_KERNEL_FILTER_GUIDING_PREPROCESS), which the CPU device does not have. I suppose I could add a minimal CPUQueue implementation that supports only the kernels needed for denoising, if that is desired?

Unifying code paths would be easy from the OIDN side of things - there the API is pretty much identical, regardless of what device is being used. The hard part is the Cycles side, where the GPU denoising path relies on the presence of a DeviceQueue, GPU memory and several kernels (e.g. DEVICE_KERNEL_FILTER_GUIDING_PREPROCESS), which the CPU device does not have. I suppose I could add a minimal CPUQueue implementation that supports only the kernels needed for denoising, if that is desired?
@ -621,9 +679,45 @@ bool OIDNDenoiser::denoise_buffer(const BufferParams &buffer_params,
return true;
}
bool OIDNDenoiser::denoise_buffer(const DenoiseTask &task)

I think the only difference from the OptiXDenoiser::denoise_buffer is that the OptiX denoiser activates the CUDA context.

It seems quite easy to abstract the GPU context activation, and avoid logic duplication here.

I think the only difference from the `OptiXDenoiser::denoise_buffer` is that the OptiX denoiser activates the CUDA context. It seems quite easy to abstract the GPU context activation, and avoid logic duplication here.
Author
Member
https://projects.blender.org/blender/blender/pulls/112229
Stefan_Werner marked this conversation as resolved
@ -641,2 +737,3 @@
return Denoiser::ensure_denoiser_device(progress);
# else
Device *denoiser_device = Denoiser::ensure_denoiser_device(progress);

Why do we skip parenting level and go straight to Denoiser::ensure_denoiser_device and not to GPUDenoiser::ensure_denoiser_device

Why do we skip parenting level and go straight to `Denoiser::ensure_denoiser_device` and not to `GPUDenoiser::ensure_denoiser_device`
Author
Member

This section of code has been removed.

This section of code has been removed.
Stefan_Werner marked this conversation as resolved
@ -643,0 +749,4 @@
#endif
}
bool OIDNDenoiser::denoise_ensure(DenoiseContext &context)

This seems to be the same as OptiXDenoiser::denoise_ensure so can be moved to GPUDenoiser as well?

This seems to be the same as `OptiXDenoiser::denoise_ensure` so can be moved to `GPUDenoiser` as well?
Author
Member

Sure, will do.

Sure, will do.
Author
Member
https://projects.blender.org/blender/blender/pulls/112229
Stefan_Werner marked this conversation as resolved
@ -643,0 +799,4 @@
#if defined(OIDN_DEVICE_CUDA) && defined(WITH_CUDA)
case DEVICE_CUDA:
case DEVICE_OPTIX: {
int device_id = static_cast<CUDADevice *>(denoiser_queue_->device)->cuDevId;

I think it will be more clear if the CUDA device information was stored in the DeviceInfo. We already have cpu_threads, so I do not't think it is unreasonable to add extra details like cuda_device_id, cuda_compute_major. Can hide those as private, only accessible via getter with an assert of the device type.

We do have such device casts in some places already, but I don't think it is a sign of a good architecture.

I think it will be more clear if the CUDA device information was stored in the `DeviceInfo`. We already have `cpu_threads`, so I do not't think it is unreasonable to add extra details like `cuda_device_id`, `cuda_compute_major`. Can hide those as private, only accessible via getter with an assert of the device type. We do have such device casts in some places already, but I don't think it is a sign of a good architecture.
Author
Member

Will be addressed when reintroducing CUDA support.

Will be addressed when reintroducing CUDA support.
Stefan_Werner marked this conversation as resolved
@ -643,0 +801,4 @@
case DEVICE_OPTIX: {
int device_id = static_cast<CUDADevice *>(denoiser_queue_->device)->cuDevId;
CUstream stream = nullptr;
oidn_device_ = oidnNewCUDADevice(&device_id, &stream, 1);

The data and kernel communication needs to happen within denoiser_queue_ (from which you are able to access stream).

Otherwise synchronizing data and kernel becomes very tricky without adding global synchronization (which I don't remember is possible when mixing default and explicit streams).

The data and kernel communication needs to happen within `denoiser_queue_` (from which you are able to access stream). Otherwise synchronizing data and kernel becomes very tricky without adding global synchronization (which I don't remember is possible when mixing default and explicit streams).
Author
Member

Thanks for pointing this out. I will restrict the initial patch to SYCL GPU devices for simplicity and put support for other devices in followup PRs.

Thanks for pointing this out. I will restrict the initial patch to SYCL GPU devices for simplicity and put support for other devices in followup PRs.
Stefan_Werner marked this conversation as resolved
@ -643,0 +807,4 @@
# endif
/* Devices without explicit support fall through to CPU backend. */
default:
oidn_device_ = oidnNewDevice(OIDN_DEVICE_TYPE_CPU);

Add a note about multi-device.

It is important that on multi-GPU render denoising happens on GPU, but locally from this code it is not immediately clear happens in such configuration.

Add a note about multi-device. It is important that on multi-GPU render denoising happens on GPU, but locally from this code it is not immediately clear happens in such configuration.
Author
Member

This part of the code has been removed, the stricter separation between code paths now makes it clear that this runs on GPUs only.

This part of the code has been removed, the stricter separation between code paths now makes it clear that this runs on GPUs only.
Stefan_Werner marked this conversation as resolved
@ -643,0 +832,4 @@
oidnSetFilterInt(oidn_filter_, "cleanAux", true);
}
albedo_filter_ = oidnNewFilter(oidn_device_, "RT");

These either needs to be allocated only if corresponding guiding pass is enabled, or explained that creation is cheap, as well as it does not cause 2 extra gigabytes of memory used.

These either needs to be allocated only if corresponding guiding pass is enabled, or explained that creation is cheap, as well as it does not cause 2 extra gigabytes of memory used.
Author
Member

Done.

Done.
Stefan_Werner marked this conversation as resolved
Author
Member

@Sergey Thanks for the quick review. I think it'll be best if I make the first pass to support SYCL devices only, and then add CUDA device support, if possible, in a second PR.

@Sergey Thanks for the quick review. I think it'll be best if I make the first pass to support SYCL devices only, and then add CUDA device support, if possible, in a second PR.
Member

git refuses to push to this diff, keeps insisting it has commits I don't have locally even after a fresh fetch.. .soo ... that's fun...

anyhow here's the changes needed for windows please commit them your self

edit: poor timing, we were working at the same time, all good now.

We're not aot'ing for the intel GPU's? when I previously scripted this I also had a OpenImageDenoise_device_sycl_xehpg.dll and OpenImageDenoise_device_sycl_xelp.dll ?

~~git refuses to push to this diff, keeps insisting it has commits I don't have locally even after a fresh fetch.. .soo ... that's fun...~~ ~~anyhow here's the changes needed for windows please commit them your self~~ edit: poor timing, we were working at the same time, all good now. We're not aot'ing for the intel GPU's? when I previously scripted this I also had a `OpenImageDenoise_device_sycl_xehpg.dll` and ` OpenImageDenoise_device_sycl_xelp.dll` ?

Please update the PR description following Commit Message Guidelines. It helps both reviewers to know what are they looking at instead of figuring out what parts of the code are supposed to be related to.

Also, is it possible to split the library update to a separate PR which will not functional changes yet? This will help the review process and followup maintenance a lot.

Please update the PR description following [Commit Message Guidelines](https://wiki.blender.org/wiki/Style_Guide/Commit_Messages). It helps both reviewers to know what are they looking at instead of figuring out what parts of the code are supposed to be related to. Also, is it possible to split the library update to a separate PR which will not functional changes yet? This will help the review process and followup maintenance a lot.

Something seems have gone wrong with the latest update. It includes 400+ changes, which made the PR to have 1.5k files changed.

Something seems have gone wrong with the latest update. It includes 400+ changes, which made the PR to have 1.5k files changed.
Author
Member

Not again. I don't know why or how git does that, I've had this before.

Not again. I don't know why or how git does that, I've had this before.
Stefan Werner force-pushed oidn2 from dd23efc25b to fb67b2dabf 2023-09-08 13:34:27 +02:00 Compare
Author
Member

Should be fixed now. My apologies for the force push, I don't know of a better way to do this.

Should be fixed now. My apologies for the force push, I don't know of a better way to do this.
Stefan Werner added a new dependency 2023-09-11 17:38:23 +02:00
Stefan Werner changed title from Update to OpenImageDenoise 2.0 to Cycles: Add GPU support for OpenImageDenoise 2023-09-11 23:35:44 +02:00
Stefan Werner force-pushed oidn2 from fb67b2dabf to 06c14abef8 2023-09-12 22:55:02 +02:00 Compare
Stefan Werner force-pushed oidn2 from 06c14abef8 to c0fdf33c54 2023-09-12 22:56:48 +02:00 Compare
Stefan Werner added 1 commit 2023-09-12 23:06:21 +02:00
Stefan Werner added 1 commit 2023-09-13 08:43:15 +02:00
Xavier Hallade changed title from Cycles: Add GPU support for OpenImageDenoise to Cycles: Add Intel GPU support for OpenImageDenoise 2023-09-13 08:57:28 +02:00
Author
Member

This PR still contains the changes from #112143. Once that has been merged, this one will be smaller.

This PR still contains the changes from https://projects.blender.org/blender/blender/pulls/112143. Once that has been merged, this one will be smaller.
Stefan Werner added 1 commit 2023-09-13 10:28:53 +02:00
e990b84190 Cleanup: Formatting and build fixes
Making sure it builds with WITH_OPENIMAGEDENOISE=OFF
`make format` for new OIDN2 code
Stefan Werner added 1 commit 2023-09-13 13:03:34 +02:00
6712ea7fc4 Cycles: Better OIDN2 GPU device support detection
OIDN2 is now better checking for device availability.
Stefan Werner added 1 commit 2023-09-13 13:21:35 +02:00
c3b06abde1 Cleanup: Removed build env/dependency changes from this Cycles PR
Since this branch can build with both V1 and V2 of OIDN, build/deps
changes can be in a completely separate PR. To test the new V2
functionality, that PR needs to be merged first.
Author
Member

This branch can now build against OIDN V1 and V2. The changes for build/deps are now entirely contained in #112143 and no longer included in this patch.

To test the new functionality in this branch, #112143 needs to be merged or applied locally first.

This branch can now build against OIDN V1 and V2. The changes for build/deps are now entirely contained in #112143 and no longer included in this patch. To test the new functionality in this branch, #112143 needs to be merged or applied locally first.
Stefan Werner requested review from Sergey Sharybin 2023-09-13 13:23:57 +02:00
Stefan Werner reviewed 2023-09-13 13:31:27 +02:00
@ -30,6 +32,7 @@ const NodeEnum *DenoiseParams::get_type_enum()
if (type_enum.empty()) {
type_enum.insert("optix", DENOISER_OPTIX);
type_enum.insert("openimageio", DENOISER_OPENIMAGEDENOISE);
Author
Member

This has mistakenly been named "openimageio". We should fix this, but not a part of this PR, I think.

This has mistakenly been named "openimageio". We should fix this, but not a part of this PR, I think.
Stefan Werner added 1 commit 2023-09-13 13:40:09 +02:00
Stefan Werner added 1 commit 2023-09-13 14:15:45 +02:00

Is good to see an OIDN denoiser which follows the typical flow of the GPU denoiser so closely! Makes it clear to see the difference.

The small comment I have is the naming. Can't say the OIDN2Denoiser is the clearest: the existing/CPU denoiser uses the same library. This is why I was proposing to use OIDNDenoiserGPU to kind of put it into a scope of flow-based naming. As in, hey, this denoiser implements the GPU denoiser flow. Maybe there are better names. Or, maybe, if we can unify the CPU and GPU sides it will simply a OIDNDenoiser, as it is now.

On a user level I don't think we should have separate options for CPU and GPU OIDN: "just" choose the best fitting device. Similar to the OptiX denoiser which uses the same device for denoising as for path tracing whenever possible. We can even decide to choose available GPU device for denoising even when path tracing happens on a CPU. If a more granular control is needed, it is better to be implemented as an explicit device selection for denoising, which then can be used for both OIDN and OptiX. But such option is not something I'd start with.

When I was mentioning separation of CPU and GPU OIDN denoiser I mainly meant the code-side separation, as it is quite tricky to follow code which has a lot of top-level if-else statements. The Denoiser::create can choose which implementation to choose based on the device selection.

One thing I am did not have time to fully understand yet is the peak memory comparison. It seems that now when pre-filter is used memory for albedo and normal is always allocated, while previously it was allowed to modify data in-place. For example, when rendering happens on multiple devices and denoiser is used a temporary RenderBuffers is created with the result gathered from all the devices, denoised, and copied back to individual devices. It is only the denoised passes which are copied, which makes it possible to modify guiding passes in-place, without extra full-frame buffer allocated. This is something that should be supported by the new denoiser implementation, to keep memory usage under control as much as possible.

The CPU/GPU unification is very interesting story! Sounds like there are ways to avoid separate CPU and GPU implementations for OIDN, so that would need to happen. Thing I am not sure is whether it's better to do such unification from the get-go, or allow some time of both implementations to co-exist for the cross-referencing purposes. Think I feel that allowing some time to co-exist is a better and more achievable step. But I am open to be convinced otherwise :)

For the CPU queue I am not really sure is the best approach. If the queue is only needed to schedule kernels we can take care of that by abstracting the call/synchronization. We can make functions like denoise_filter_color_preprocess virtual and let the OIDN denoiser implementation to take care of whether using queue or a direct kernel call. Somehow it seems like a better idea than opening a topic of introduction of CPU queue.

Is good to see an OIDN denoiser which follows the typical flow of the GPU denoiser so closely! Makes it clear to see the difference. The small comment I have is the naming. Can't say the `OIDN2Denoiser` is the clearest: the existing/CPU denoiser uses the same library. This is why I was proposing to use `OIDNDenoiserGPU` to kind of put it into a scope of flow-based naming. As in, hey, this denoiser implements the GPU denoiser flow. Maybe there are better names. Or, maybe, if we can unify the CPU and GPU sides it will simply a `OIDNDenoiser`, as it is now. On a user level I don't think we should have separate options for CPU and GPU OIDN: "just" choose the best fitting device. Similar to the OptiX denoiser which uses the same device for denoising as for path tracing whenever possible. We can even decide to choose available GPU device for denoising even when path tracing happens on a CPU. If a more granular control is needed, it is better to be implemented as an explicit device selection for denoising, which then can be used for both OIDN and OptiX. But such option is not something I'd start with. When I was mentioning separation of CPU and GPU OIDN denoiser I mainly meant the code-side separation, as it is quite tricky to follow code which has a lot of top-level if-else statements. The `Denoiser::create` can choose which implementation to choose based on the device selection. One thing I am did not have time to fully understand yet is the peak memory comparison. It seems that now when pre-filter is used memory for albedo and normal is always allocated, while previously it was allowed to modify data in-place. For example, when rendering happens on multiple devices and denoiser is used a temporary RenderBuffers is created with the result gathered from all the devices, denoised, and copied back to individual devices. It is only the denoised passes which are copied, which makes it possible to modify guiding passes in-place, without extra full-frame buffer allocated. This is something that should be supported by the new denoiser implementation, to keep memory usage under control as much as possible. The CPU/GPU unification is very interesting story! Sounds like there are ways to avoid separate CPU and GPU implementations for OIDN, so that would need to happen. Thing I am not sure is whether it's better to do such unification from the get-go, or allow some time of both implementations to co-exist for the cross-referencing purposes. Think I feel that allowing some time to co-exist is a better and more achievable step. But I am open to be convinced otherwise :) For the CPU queue I am not really sure is the best approach. If the queue is only needed to schedule kernels we can take care of that by abstracting the call/synchronization. We can make functions like `denoise_filter_color_preprocess` virtual and let the OIDN denoiser implementation to take care of whether using queue or a direct kernel call. Somehow it seems like a better idea than opening a topic of introduction of CPU queue.

I think some user control over using CPU or GPU denoising would be helpful.

The safe would be to use GPU denoising only for GPU rendering by default, maybe also for CPU viewport rendering it's fine. But final and especially background CPU renders should probably not use the GPU by default, and be something that a user has to enable. Because you wouldn't want to automatically use whatever GPU may exist on a render farm node, when it may be a simple integrated or low memory CPU not meant for that purpose.

Perhaps this could be configured in the preferences rather than the scene?

Even better would be a way to configure in the preferences which GPU (if any) to use for denoising.

I think some user control over using CPU or GPU denoising would be helpful. The safe would be to use GPU denoising only for GPU rendering by default, maybe also for CPU viewport rendering it's fine. But final and especially background CPU renders should probably not use the GPU by default, and be something that a user has to enable. Because you wouldn't want to automatically use whatever GPU may exist on a render farm node, when it may be a simple integrated or low memory CPU not meant for that purpose. Perhaps this could be configured in the preferences rather than the scene? Even better would be a way to configure in the preferences which GPU (if any) to use for denoising.

That is all valid points, Brecht. I do agree such control is needed. Is just to me implementing it as a user preference or as a denoiser option in scene seems more flexible, more future proof, and reusable for OptiX than implementing such functionality via denoiser type.

I am not sure how OIDN implements device selection. Implementing the device used for OptiX denoiser seems straightforward: add a code to the find_best_denoiser_device_info which will return DeviceInfo for the selected in user preferences device.

One thing which would be nice to agree on is the breakdown in terms what gets implemented when.
The way i see it is:

  • Initial OIDN2 implementation follows simple rule: when rendering on oneAPI use oneAPI denoiser, otherwise use CPU denoiser. I think all OIDN capable devices are powerful enough to run denoiser.
  • Add a device selection
  • Unify code paths for CPU and GPU OIDN implementations.

How does it sound to you?

That is all valid points, Brecht. I do agree such control is needed. Is just to me implementing it as a user preference or as a denoiser option in scene seems more flexible, more future proof, and reusable for OptiX than implementing such functionality via denoiser type. I am not sure how OIDN implements device selection. Implementing the device used for OptiX denoiser seems straightforward: add a code to the `find_best_denoiser_device_info` which will return `DeviceInfo` for the selected in user preferences device. One thing which would be nice to agree on is the breakdown in terms what gets implemented when. The way i see it is: - Initial OIDN2 implementation follows simple rule: when rendering on oneAPI use oneAPI denoiser, otherwise use CPU denoiser. I think all OIDN capable devices are powerful enough to run denoiser. - Add a device selection - Unify code paths for CPU and GPU OIDN implementations. How does it sound to you?

Sounds fine, it's good to keep this PR simple and then add CPU render + GPU denoising support as a second step with a proper UI.

Sounds fine, it's good to keep this PR simple and then add CPU render + GPU denoising support as a second step with a proper UI.
Author
Member

All currently supported (and planned) oneAPI GPU that can run Cycles can also run OIDN2. However, a user may want to render on the GPU and denoise on the CPU if memory is tight - I've seen a few scenes where that made a difference between rendering and out of memory errors. Likewise, someone with a scene too big for the GPU may still want to use GPU viewport denoising for performance reasons.

As a side note, denoising on a oneAPI GPU while rendering on an Nvidia GPU worked for me too. Since the reverse scenario, render on oneAPI and denoise in OptiX, was already working, this was not unexpected, still something I was happy to see.

@Sergey I'll take a look at memory consumption, if there are any excess allocations that can be skipped.

All currently supported (and planned) oneAPI GPU that can run Cycles can also run OIDN2. However, a user may want to render on the GPU and denoise on the CPU if memory is tight - I've seen a few scenes where that made a difference between rendering and out of memory errors. Likewise, someone with a scene too big for the GPU may still want to use GPU viewport denoising for performance reasons. As a side note, denoising on a oneAPI GPU while rendering on an Nvidia GPU worked for me too. Since the reverse scenario, render on oneAPI and denoise in OptiX, was already working, this was not unexpected, still something I was happy to see. @Sergey I'll take a look at memory consumption, if there are any excess allocations that can be skipped.

If someone has multiple GPUs, especially if they are not equal in performance it does not feel enough to have just CPU and GPU separation, and a more granular control seems to be a good idea. Having it could also benefit OptiX denoiser.

If someone has multiple GPUs, especially if they are not equal in performance it does not feel enough to have just CPU and GPU separation, and a more granular control seems to be a good idea. Having it could also benefit OptiX denoiser.
First-time contributor

Is it OK if I make a suggestion as a user?

To keep it as simple as possible, In the preferences have checkboxes for:

  • denoising devices (same as cycles render devices). Allow user to exclude slow devices. If any GPU selected then assume user always wants to favour GPU denoising.

if CPU and GPU are ticked in the denoising devices list, always favour GPU unless multiple compositor denoise nodes, in which case assign one GPU to each node, and one node to the CPU. When no nodes remaining, have the GPU restart the CPU's node if time remaining on it > GPU's average denoise time per completed node.

Beyond that I think everything should be automatic based on a quick calculation prior to execution to ensure only eligible GPU's are used based on them having sufficient VRAM to complete the task (for example if VRAM can't be emptied before denoising/moving onto the next frame due to persistent data being enabled (unless it could be briefly moved to system ram during denoising?))

If no eligible GPUs are found, then revert to CPU (even if not chosen as a denoise device)

I think those options and logic will keep it as simple as possible for the user, whilst ensuring best possible performance, without the user having to mess about trying to calculate if their GPU has sufficient resources for the resolution they wish to denoise (which is tricky because the user often doesn't have a clue how much VRAM is in use/available when cycles finishes sampling and denoising/compositor execution begins).

Is it OK if I make a suggestion as a user? To keep it as simple as possible, In the preferences have checkboxes for: - denoising devices (same as cycles render devices). Allow user to exclude slow devices. If any GPU selected then assume user always wants to favour GPU denoising. if CPU and GPU are ticked in the denoising devices list, always favour GPU unless multiple compositor denoise nodes, in which case assign one GPU to each node, and one node to the CPU. When no nodes remaining, have the GPU restart the CPU's node if time remaining on it > GPU's average denoise time per completed node. Beyond that I think everything should be automatic based on a quick calculation prior to execution to ensure only eligible GPU's are used based on them having sufficient VRAM to complete the task (for example if VRAM can't be emptied before denoising/moving onto the next frame due to persistent data being enabled (unless it could be briefly moved to system ram during denoising?)) If no eligible GPUs are found, then revert to CPU (even if not chosen as a denoise device) I think those options and logic will keep it as simple as possible for the user, whilst ensuring best possible performance, without the user having to mess about trying to calculate if their GPU has sufficient resources for the resolution they wish to denoise (which is tricky because the user often doesn't have a clue how much VRAM is in use/available when cycles finishes sampling and denoising/compositor execution begins).
Xavier Hallade modified the milestone from 4.0 to 4.1 2023-09-27 13:18:19 +02:00

All currently supported (and planned) oneAPI GPU that can run Cycles can also run OIDN2. However, a user may want to render on the GPU and denoise on the CPU if memory is tight - I've seen a few scenes where that made a difference between rendering and out of memory errors.

If we can detect device out of memory and then fall back to the CPU that would be a nice way to solve this problem. I don't know how practical it is to either estimate this or get good error messages from OIDN?

Likewise, someone with a scene too big for the GPU may still want to use GPU viewport denoising for performance reasons.

Agreed, I think for the viewport we could just use the GPU by default if it is supported, since it's already using the GPU for drawing the rest of the UI and viewport. It's mainly in final and background renders with potentially very high resolutions or on render farms where we don't want to automatically use the GPU for CPU renders.

> All currently supported (and planned) oneAPI GPU that can run Cycles can also run OIDN2. However, a user may want to render on the GPU and denoise on the CPU if memory is tight - I've seen a few scenes where that made a difference between rendering and out of memory errors. If we can detect device out of memory and then fall back to the CPU that would be a nice way to solve this problem. I don't know how practical it is to either estimate this or get good error messages from OIDN? > Likewise, someone with a scene too big for the GPU may still want to use GPU viewport denoising for performance reasons. Agreed, I think for the viewport we could just use the GPU by default if it is supported, since it's already using the GPU for drawing the rest of the UI and viewport. It's mainly in final and background renders with potentially very high resolutions or on render farms where we don't want to automatically use the GPU for CPU renders.
Stefan Werner added 1 commit 2023-10-10 14:15:42 +02:00
Stefan Werner added 1 commit 2023-11-08 10:52:51 +01:00
Stefan Werner added 2 commits 2023-11-17 10:59:25 +01:00
4233d6f1c3 Cycles: Automatic selection of OIDN on GPU or CPU
The manual selection of OIDN on GPU or CPU has been removed,
denoising now happens on the same device that was used for
rendering.
Stefan Werner added 1 commit 2023-11-17 11:07:18 +01:00
Stefan Werner added 1 commit 2023-11-17 12:13:34 +01:00
fc23fd0e07 Cycles: Removed redundant buffers in OIDN GPU
The albedo/normal buffers were prefiltered in separate memory,
where it can easily be done in place without side effect.
Author
Member

@Sergey @brecht When you get the time, can you take another look please?

Device selection now follows the selected render device, with one exception - when "automatic" is selected on a machine that can use both the OptiX and the OIDN GPU denoiser, OptiX will be preferred regardless of the selected render device. This is because the automatic selection looks only at available devices, not selected devices:
https://projects.blender.org/blender/blender/src/branch/main/intern/cycles/blender/sync.cpp#L997

Sergey, you were right about the duplicated buffers for normal and albedo prefiltering. This is now fixed.

@Sergey @brecht When you get the time, can you take another look please? Device selection now follows the selected render device, with one exception - when "automatic" is selected on a machine that can use both the OptiX and the OIDN GPU denoiser, OptiX will be preferred regardless of the selected render device. This is because the automatic selection looks only at available devices, not selected devices: https://projects.blender.org/blender/blender/src/branch/main/intern/cycles/blender/sync.cpp#L997 Sergey, you were right about the duplicated buffers for normal and albedo prefiltering. This is now fixed.
Stefan Werner added 1 commit 2023-11-17 13:05:47 +01:00
Brecht Van Lommel requested changes 2023-11-17 14:51:05 +01:00
Brecht Van Lommel left a comment
Owner

This is looking good as the first step now. Just one minor comment.

I created #115045 to track the remaining tasks.

OIDN 2.1 libraries are committed for Linux and macOS, for Windows they are missing still.

This is looking good as the first step now. Just one minor comment. I created #115045 to track the remaining tasks. OIDN 2.1 libraries are committed for Linux and macOS, for Windows they are missing still.
@ -13,3 +13,3 @@
return "OptiX";
case DENOISER_OPENIMAGEDENOISE:
return "OpenImageDenoise";
return "OpenImageDenoise (CPU)";

Remove (CPU) here

Remove ` (CPU)` here
Stefan_Werner marked this conversation as resolved
Stefan Werner added 1 commit 2023-11-17 15:20:08 +01:00
Author
Member

Ok, I'll wait for the Windows binaries then.

Ok, I'll wait for the Windows binaries then.
Brecht Van Lommel approved these changes 2023-11-17 15:33:53 +01:00
Member

To prevent having to update it twice, I was planning on updating oidn lib with the rest of the 4.1 lib update which is probably weeks if not months out, given i'm seemingly holding up something, I'll prioritize them

To prevent having to update it twice, I was planning on updating oidn lib with the rest of the 4.1 lib update which is probably weeks if not months out, given i'm seemingly holding up something, I'll prioritize them

Thanks!

Thanks!
Member

Done in r63572, sorry for the delay

Done in r63572, sorry for the delay
Stefan Werner added 1 commit 2023-11-20 09:25:43 +01:00
Stefan Werner added 1 commit 2023-11-20 11:05:23 +01:00
Stefan Werner merged commit 02b5e27f89 into main 2023-11-20 11:12:51 +01:00
Stefan Werner deleted branch oidn2 2023-11-20 11:12:53 +01:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
7 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: blender/blender#108314
No description provided.