Cycles: Add Intel GPU support for OpenImageDenoise #108314

Stefan Werner · 2023-05-26T14:49:17+02:00

Stefan Werner commented

2023-05-26 14:49:17 +02:00

OpenImageDenoise V2 comes with GPU support for various backends. This adds a new class, OIDN2Denoiser, in order to add this functionality into the existing Cycles post processing pipeline without having to change it much. OptiX and OIDN CPU denoising remain as they are, there is now a new user option to select OIDN GPU denoising when a compatible device is detected.

Device support is initially limited to the oneAPI devices that are supported by Cycles already, but can be extended.

Ref #115045

OpenImageDenoise V2 comes with GPU support for various backends. This adds a new class, OIDN2Denoiser, in order to add this functionality into the existing Cycles post processing pipeline without having to change it much. OptiX and OIDN CPU denoising remain as they are, there is now a new user option to select OIDN GPU denoising when a compatible device is detected. Device support is initially limited to the oneAPI devices that are supported by Cycles already, but can be extended. Ref #115045

❤️ 13 🎉 4 🚀 12

Xavier Hallade added this to the 4.0 milestone 2023-05-30 19:03:37 +02:00

Xavier Hallade added this to the Render & Cycles project 2023-05-30 19:03:42 +02:00

Xavier Hallade added the

Module

Render & Cycles

label 2023-05-30 19:03:59 +02:00

Xavier Hallade commented

2023-06-29 17:27:48 +02:00

on the deps, something similar as in 66b4e426cc needs to be done, esp using ninja on Windows (CMAKE_GENERATOR ${PLATFORM_ALT_GENERATOR})
https://github.com/OpenImageDenoise/oidn/blob/master/doc/compilation.md also recommends using ocloc 101.4314 but it has known issues with Blender, updating ocloc to a newer version such as 101.4369 will be needed.

on the deps, something similar as in https://projects.blender.org/blender/blender/commit/66b4e426cc3d8a12adca585bea217f2013568393 needs to be done, esp using ninja on Windows (`CMAKE_GENERATOR ${PLATFORM_ALT_GENERATOR}`) https://github.com/OpenImageDenoise/oidn/blob/master/doc/compilation.md also recommends using ocloc 101.4314 but it has known issues with Blender, updating ocloc to a newer version such as [101.4369](https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html) will be needed.

Ray molenkamp commented

2023-06-29 17:43:50 +02:00

I'll figure it out, not overly worried, I'll bring up the Windows/CPU support first, and we'll branch out from there.

Just to clarify: i'm working the deps builder part, best not to duplicate this work

I'll figure it out, not overly worried, I'll bring up the Windows/CPU support first, and we'll branch out from there. Just to clarify: i'm working the deps builder part, best not to duplicate this work

👍 2

Stefan Werner force-pushed oidn2 from 0465255054 to ecba53c269

2023-07-25 16:00:12 +02:00

Compare

Stefan Werner force-pushed oidn2 from ecba53c269 to 3d72ee1be4

2023-07-25 16:05:29 +02:00

Compare

Raimund Klink commented

2023-08-11 09:15:46 +02:00

What is currently holding up this PR?

Ray molenkamp commented

2023-08-11 17:13:45 +02:00

the WIP label

👍 2 👀 2

Stefan Werner changed title from ~~WIP: Update to OpenImageDenoise 2.0~~ to Update to OpenImageDenoise 2.0

2023-08-22 14:28:04 +02:00

Stefan Werner requested review from Brecht Van Lommel 2023-08-22 15:53:33 +02:00

Stefan Werner requested review from Sergey Sharybin 2023-08-22 15:53:33 +02:00

Sergey Sharybin commented

2023-08-22 16:27:35 +02:00

Please follow the Commit Message Guidelines and present the PR accordingly to what it is.

If it is just an update of a library without user-level changes then I am not really sure about deeper changes in the OIDNDenoiser and DenoiserGPU .

If this change actually brings GPU support for the OIDN it should be presented as such, and mentioned what are supported GPUs are, what are the limitations (user-level and technical), some major architectural considerations, etc.

Please follow the [Commit Message Guidelines](https://wiki.blender.org/wiki/Style_Guide/Commit_Messages) and present the PR accordingly to what it is. If it is just an update of a library without user-level changes then I am not really sure about deeper changes in the `OIDNDenoiser` and `DenoiserGPU `. If this change actually brings GPU support for the OIDN it should be presented as such, and mentioned what are supported GPUs are, what are the limitations (user-level and technical), some major architectural considerations, etc.

Sergey Sharybin requested changes 2023-08-22 17:09:58 +02:00

Sergey Sharybin left a comment

Initial pass of review.

intern/cycles/device/cuda/device.cpp Outdated

						
				@ -139,2 +143,4 @@

				    info.has_nanovdb = true;

				    info.denoisers = 0;

				#  if defined(WITH_OPENIMAGEDENOISE) && defined(OIDN_DEVICE_CUDA)

				    if (major >= 7) {

Sergey Sharybin commented

2023-08-22 16:33:36 +02:00

There is discrepancy here:

All the rest of devices use OIDNDenoiser ::is_device_supported(info) which is goo, because it then can be seen as a source of truth, but here some specialized check is done here.
When using, say, sm_60 the code here will report that OIDN is not supported on the device, but the OIDNDenoiser ::is_device_supported (which doesn't do compute capability) will report the OIDN is supported on the device.

Architecturally there needs to be a single source of truth.
If it is for some reason not possible it needs to be a clear documentation in the API.

There is discrepancy here: - All the rest of devices use `OIDNDenoiser ::is_device_supported(info)` which is goo, because it then can be seen as a source of truth, but here some specialized check is done here. - When using, say, sm_60 the code here will report that OIDN is not supported on the device, but the `OIDNDenoiser ::is_device_supported` (which doesn't do compute capability) will report the OIDN is supported on the device. Architecturally there needs to be a single source of truth. If it is for some reason not possible it needs to be a clear documentation in the API.

Stefan Werner commented

2023-09-13 09:34:53 +02:00

This code has been backed out with the removal of CUDA support, will be looked revisited when CUDA support is reintroduced.

Stefan_Werner marked this conversation as resolved

intern/cycles/integrator/denoiser_gpu.h Outdated

						
				@ -57,2 +57,4 @@

				  bool denoise_filter_guiding_set_fake_albedo(const DenoiseContext &context);

				  /* Read guiding passes from the render buffers, preprocess them in a way which is expected by

				   * OptiX and store in the guiding passes memory within the given context.

Sergey Sharybin commented

2023-08-22 16:34:57 +02:00

Comments needs to be updated: the function call is no longer OptiX specific.

Stefan_Werner marked this conversation as resolved

intern/cycles/integrator/denoiser_oidn.cpp Outdated

						
				@ -577,4 +637,3 @@

				      << "OpenImageDenoiser is not supported on this platform or build.";

				#ifdef WITH_OPENIMAGEDENOISE

				  thread_scoped_lock lock(mutex_);

Sergey Sharybin commented

2023-08-22 16:42:14 +02:00

Any specific reason of removing this?

Some information got lost during Cycles X rewrite, the original comment stated:

    /* Only one at a time, since OpenImageDenoise itself is multithreaded for full
     * buffers, and for tiled rendering because creating multiple devices and filters
     * is slow and memory hungry as well.
     *
     * TODO: optimize tiled rendering case, by batching together denoising of many
     * tiles somehow? */

We don't have tiles anymore, but we still don't want to run multiple OIDN denoisers at the same time.

If you looked deep into it, and the change is safe it is much better to isolate it to separate PR, explaining why this code is no longer relevant.

Any specific reason of removing this? Some information got lost during Cycles X rewrite, the original comment stated: ``` /* Only one at a time, since OpenImageDenoise itself is multithreaded for full * buffers, and for tiled rendering because creating multiple devices and filters * is slow and memory hungry as well. * * TODO: optimize tiled rendering case, by batching together denoising of many * tiles somehow? */ ``` We don't have tiles anymore, but we still don't want to run multiple OIDN denoisers at the same time. If you looked deep into it, and the change is safe it is much better to isolate it to separate PR, explaining why this code is no longer relevant.

Stefan Werner commented

2023-09-13 09:36:04 +02:00

Mutex is back. Should it get removed again, it will be a separate commit and not part of this PR.

Stefan_Werner marked this conversation as resolved

intern/cycles/integrator/denoiser_oidn.cpp Outdated

						
				@ -573,12 +628,15 @@ bool OIDNDenoiser::denoise_buffer(const BufferParams &buffer_params,

				                                  const int num_samples,

				                                  bool allow_inplace_modification)

				{

				  if (denoiser_device_->info.type != DEVICE_CPU && !cpu_fallback_) {

Sergey Sharybin commented

2023-08-22 16:45:58 +02:00

Is it possible to unify the code path for CPU and GPU OIDN denoising?

It is quite weird to maintain two separate denoising codepaths in the same denoiser implementation. It is almost like better to split them into OIDNDenoiserCPU and OIDNDenoiserGPU.

Is it possible to unify the code path for CPU and GPU OIDN denoising? It is quite weird to maintain two separate denoising codepaths in the same denoiser implementation. It is almost like better to split them into `OIDNDenoiserCPU` and `OIDNDenoiserGPU`.

Stefan Werner commented

2023-08-25 12:29:48 +02:00

Unifying code paths would be easy from the OIDN side of things - there the API is pretty much identical, regardless of what device is being used.

The hard part is the Cycles side, where the GPU denoising path relies on the presence of a DeviceQueue, GPU memory and several kernels (e.g. DEVICE_KERNEL_FILTER_GUIDING_PREPROCESS), which the CPU device does not have. I suppose I could add a minimal CPUQueue implementation that supports only the kernels needed for denoising, if that is desired?

Unifying code paths would be easy from the OIDN side of things - there the API is pretty much identical, regardless of what device is being used. The hard part is the Cycles side, where the GPU denoising path relies on the presence of a DeviceQueue, GPU memory and several kernels (e.g. DEVICE_KERNEL_FILTER_GUIDING_PREPROCESS), which the CPU device does not have. I suppose I could add a minimal CPUQueue implementation that supports only the kernels needed for denoising, if that is desired?

intern/cycles/integrator/denoiser_oidn.cpp Outdated

						
				@ -621,9 +679,45 @@ bool OIDNDenoiser::denoise_buffer(const BufferParams &buffer_params,

				  return true;

				}

				bool OIDNDenoiser::denoise_buffer(const DenoiseTask &task)

Sergey Sharybin commented

2023-08-22 16:48:25 +02:00

I think the only difference from the OptiXDenoiser::denoise_buffer is that the OptiX denoiser activates the CUDA context.

It seems quite easy to abstract the GPU context activation, and avoid logic duplication here.

I think the only difference from the `OptiXDenoiser::denoise_buffer` is that the OptiX denoiser activates the CUDA context. It seems quite easy to abstract the GPU context activation, and avoid logic duplication here.

Stefan Werner commented

2023-09-11 15:17:32 +02:00

#112229

https://projects.blender.org/blender/blender/pulls/112229

Stefan_Werner marked this conversation as resolved

intern/cycles/integrator/denoiser_oidn.cpp Outdated

						
				@ -641,2 +737,3 @@

				  return Denoiser::ensure_denoiser_device(progress);

				#  else

				  Device *denoiser_device = Denoiser::ensure_denoiser_device(progress);

Sergey Sharybin commented

2023-08-22 16:49:07 +02:00

Why do we skip parenting level and go straight to Denoiser::ensure_denoiser_device and not to GPUDenoiser::ensure_denoiser_device

Why do we skip parenting level and go straight to `Denoiser::ensure_denoiser_device` and not to `GPUDenoiser::ensure_denoiser_device`

Stefan Werner commented

2023-09-13 09:39:17 +02:00

This section of code has been removed.

Stefan_Werner marked this conversation as resolved

intern/cycles/integrator/denoiser_oidn.cpp Outdated

						
				@ -643,0 +749,4 @@

				#endif

				}

				bool OIDNDenoiser::denoise_ensure(DenoiseContext &context)

Sergey Sharybin commented

2023-08-22 16:50:02 +02:00

This seems to be the same as OptiXDenoiser::denoise_ensure so can be moved to GPUDenoiser as well?

This seems to be the same as `OptiXDenoiser::denoise_ensure` so can be moved to `GPUDenoiser` as well?

Stefan Werner commented

2023-08-25 12:22:54 +02:00

Sure, will do.

Stefan Werner commented

2023-09-11 15:17:13 +02:00

#112229

https://projects.blender.org/blender/blender/pulls/112229

Stefan_Werner marked this conversation as resolved

intern/cycles/integrator/denoiser_oidn.cpp Outdated

						
				@ -643,0 +799,4 @@

				#if defined(OIDN_DEVICE_CUDA) && defined(WITH_CUDA)

				    case DEVICE_CUDA:

				    case DEVICE_OPTIX: {

				      int device_id = static_cast<CUDADevice *>(denoiser_queue_->device)->cuDevId;

Sergey Sharybin commented

2023-08-22 16:55:33 +02:00

I think it will be more clear if the CUDA device information was stored in the DeviceInfo. We already have cpu_threads, so I do not't think it is unreasonable to add extra details like cuda_device_id, cuda_compute_major. Can hide those as private, only accessible via getter with an assert of the device type.

We do have such device casts in some places already, but I don't think it is a sign of a good architecture.

I think it will be more clear if the CUDA device information was stored in the `DeviceInfo`. We already have `cpu_threads`, so I do not't think it is unreasonable to add extra details like `cuda_device_id`, `cuda_compute_major`. Can hide those as private, only accessible via getter with an assert of the device type. We do have such device casts in some places already, but I don't think it is a sign of a good architecture.

Stefan Werner commented

2023-09-13 09:39:58 +02:00

Will be addressed when reintroducing CUDA support.

Stefan_Werner marked this conversation as resolved

intern/cycles/integrator/denoiser_oidn.cpp Outdated

						
				@ -643,0 +801,4 @@

				    case DEVICE_OPTIX: {

				      int device_id = static_cast<CUDADevice *>(denoiser_queue_->device)->cuDevId;

				      CUstream stream = nullptr;

				      oidn_device_ = oidnNewCUDADevice(&device_id, &stream, 1);

Sergey Sharybin commented

2023-08-22 17:05:24 +02:00

The data and kernel communication needs to happen within denoiser_queue_ (from which you are able to access stream).

Otherwise synchronizing data and kernel becomes very tricky without adding global synchronization (which I don't remember is possible when mixing default and explicit streams).

The data and kernel communication needs to happen within `denoiser_queue_` (from which you are able to access stream). Otherwise synchronizing data and kernel becomes very tricky without adding global synchronization (which I don't remember is possible when mixing default and explicit streams).

Stefan Werner commented

2023-08-25 12:24:55 +02:00

Thanks for pointing this out. I will restrict the initial patch to SYCL GPU devices for simplicity and put support for other devices in followup PRs.

Stefan_Werner marked this conversation as resolved

intern/cycles/integrator/denoiser_oidn.cpp Outdated

						
				@ -643,0 +807,4 @@

				#    endif

				    /* Devices without explicit support fall through to CPU backend. */

				    default:

				      oidn_device_ = oidnNewDevice(OIDN_DEVICE_TYPE_CPU);

Sergey Sharybin commented

2023-08-22 16:57:45 +02:00

Add a note about multi-device.

It is important that on multi-GPU render denoising happens on GPU, but locally from this code it is not immediately clear happens in such configuration.

Add a note about multi-device. It is important that on multi-GPU render denoising happens on GPU, but locally from this code it is not immediately clear happens in such configuration.

Stefan Werner commented

2023-09-13 13:28:22 +02:00

This part of the code has been removed, the stricter separation between code paths now makes it clear that this runs on GPUs only.

Stefan_Werner marked this conversation as resolved

intern/cycles/integrator/denoiser_oidn.cpp Outdated

						
				@ -643,0 +832,4 @@

				    oidnSetFilterInt(oidn_filter_, "cleanAux", true);

				  }

				  albedo_filter_ = oidnNewFilter(oidn_device_, "RT");

Sergey Sharybin commented

2023-08-22 17:07:31 +02:00

These either needs to be allocated only if corresponding guiding pass is enabled, or explained that creation is cheap, as well as it does not cause 2 extra gigabytes of memory used.

Stefan Werner commented

2023-09-13 13:29:07 +02:00

Done.

Stefan_Werner marked this conversation as resolved

Stefan Werner commented

2023-08-22 22:10:27 +02:00

@Sergey Thanks for the quick review. I think it'll be best if I make the first pass to support SYCL devices only, and then add CUDA device support, if possible, in a second PR.

Ray molenkamp commented

2023-08-23 00:34:02 +02:00

~~git refuses to push to this diff, keeps insisting it has commits I don't have locally even after a fresh fetch.. .soo ... that's fun...~~

~~anyhow here's the changes needed for windows please commit them your self~~

edit: poor timing, we were working at the same time, all good now.

We're not aot'ing for the intel GPU's? when I previously scripted this I also had a OpenImageDenoise_device_sycl_xehpg.dll and OpenImageDenoise_device_sycl_xelp.dll ?

~~git refuses to push to this diff, keeps insisting it has commits I don't have locally even after a fresh fetch.. .soo ... that's fun...~~ ~~anyhow here's the changes needed for windows please commit them your self~~ edit: poor timing, we were working at the same time, all good now. We're not aot'ing for the intel GPU's? when I previously scripted this I also had a `OpenImageDenoise_device_sycl_xehpg.dll` and ` OpenImageDenoise_device_sycl_xelp.dll` ?

oidn2_windows.diff

11 KiB

Xavier Hallade referenced this pull request

2023-08-31 10:47:45 +02:00

Cycles: update ocloc to 101.4369 on Windows #111729

Xavier Hallade referenced this issue from a commit

2023-09-01 15:22:01 +02:00

Cycles: update ocloc to 101.4369 on Windows

Sergey Sharybin commented

2023-09-04 14:42:38 +02:00

Please update the PR description following Commit Message Guidelines. It helps both reviewers to know what are they looking at instead of figuring out what parts of the code are supposed to be related to.

Also, is it possible to split the library update to a separate PR which will not functional changes yet? This will help the review process and followup maintenance a lot.

Please update the PR description following [Commit Message Guidelines](https://wiki.blender.org/wiki/Style_Guide/Commit_Messages). It helps both reviewers to know what are they looking at instead of figuring out what parts of the code are supposed to be related to. Also, is it possible to split the library update to a separate PR which will not functional changes yet? This will help the review process and followup maintenance a lot.

Sergey Sharybin commented

2023-09-07 16:48:31 +02:00

Something seems have gone wrong with the latest update. It includes 400+ changes, which made the PR to have 1.5k files changed.

Stefan Werner commented

2023-09-07 22:31:29 +02:00

Not again. I don't know why or how git does that, I've had this before.

Stefan Werner force-pushed oidn2 from dd23efc25b to fb67b2dabf

2023-09-08 13:34:27 +02:00

Compare

Stefan Werner commented

2023-09-08 13:36:24 +02:00

Should be fixed now. My apologies for the force push, I don't know of a better way to do this.

Stefan Werner added a new dependency 2023-09-11 17:38:23 +02:00

#112143 Build: Update OpenImageDenoise to 2.1.0

Stefan Werner changed title from ~~Update to OpenImageDenoise 2.0~~ to Cycles: Add GPU support for OpenImageDenoise

2023-09-11 23:35:44 +02:00

Stefan Werner force-pushed oidn2 from fb67b2dabf to 06c14abef8

2023-09-12 22:55:02 +02:00

Compare

Stefan Werner force-pushed oidn2 from 06c14abef8 to c0fdf33c54

2023-09-12 22:56:48 +02:00

Compare

Stefan Werner added 1 commit 2023-09-12 23:06:21 +02:00

648617dbeb Merge branch 'main' into oidn2

Stefan Werner added 1 commit 2023-09-13 08:43:15 +02:00

1511609e17 Cleanup: Removed debugging printf()s

Xavier Hallade changed title from ~~Cycles: Add GPU support for OpenImageDenoise~~ to Cycles: Add Intel GPU support for OpenImageDenoise

2023-09-13 08:57:28 +02:00

Stefan Werner commented

2023-09-13 09:32:32 +02:00

This PR still contains the changes from #112143. Once that has been merged, this one will be smaller.

This PR still contains the changes from https://projects.blender.org/blender/blender/pulls/112143. Once that has been merged, this one will be smaller.

Stefan Werner added 1 commit 2023-09-13 10:28:53 +02:00

e990b84190 Cleanup: Formatting and build fixes

Making sure it builds with WITH_OPENIMAGEDENOISE=OFF
`make format` for new OIDN2 code

Stefan Werner added 1 commit 2023-09-13 13:03:34 +02:00

6712ea7fc4 Cycles: Better OIDN2 GPU device support detection

OIDN2 is now better checking for device availability.

Stefan Werner added 1 commit 2023-09-13 13:21:35 +02:00

c3b06abde1 Cleanup: Removed build env/dependency changes from this Cycles PR

Since this branch can build with both V1 and V2 of OIDN, build/deps
changes can be in a completely separate PR. To test the new V2
functionality, that PR needs to be merged first.

Stefan Werner commented

2023-09-13 13:23:42 +02:00

This branch can now build against OIDN V1 and V2. The changes for build/deps are now entirely contained in #112143 and no longer included in this patch.

To test the new functionality in this branch, #112143 needs to be merged or applied locally first.

This branch can now build against OIDN V1 and V2. The changes for build/deps are now entirely contained in #112143 and no longer included in this patch. To test the new functionality in this branch, #112143 needs to be merged or applied locally first.

Stefan Werner requested review from Sergey Sharybin 2023-09-13 13:23:57 +02:00

Stefan Werner reviewed 2023-09-13 13:31:27 +02:00

intern/cycles/device/denoise.cpp

						
				@ -30,6 +32,7 @@ const NodeEnum *DenoiseParams::get_type_enum()

				  if (type_enum.empty()) {

				    type_enum.insert("optix", DENOISER_OPTIX);

				    type_enum.insert("openimageio", DENOISER_OPENIMAGEDENOISE);

Stefan Werner commented

2023-09-13 13:31:27 +02:00

This has mistakenly been named "openimageio". We should fix this, but not a part of this PR, I think.

Stefan Werner added 1 commit 2023-09-13 13:40:09 +02:00

12897e04ec Cycles: Make automatic Denoiser selection pick OIDN on GPU where available

Stefan Werner added 1 commit 2023-09-13 14:15:45 +02:00

b3f79c4b2c Fix: Prevent Cycles from attempting to compile OIDN2 kernels

Sergey Sharybin commented

2023-09-15 17:04:18 +02:00

Is good to see an OIDN denoiser which follows the typical flow of the GPU denoiser so closely! Makes it clear to see the difference.

The small comment I have is the naming. Can't say the OIDN2Denoiser is the clearest: the existing/CPU denoiser uses the same library. This is why I was proposing to use OIDNDenoiserGPU to kind of put it into a scope of flow-based naming. As in, hey, this denoiser implements the GPU denoiser flow. Maybe there are better names. Or, maybe, if we can unify the CPU and GPU sides it will simply a OIDNDenoiser, as it is now.

On a user level I don't think we should have separate options for CPU and GPU OIDN: "just" choose the best fitting device. Similar to the OptiX denoiser which uses the same device for denoising as for path tracing whenever possible. We can even decide to choose available GPU device for denoising even when path tracing happens on a CPU. If a more granular control is needed, it is better to be implemented as an explicit device selection for denoising, which then can be used for both OIDN and OptiX. But such option is not something I'd start with.

When I was mentioning separation of CPU and GPU OIDN denoiser I mainly meant the code-side separation, as it is quite tricky to follow code which has a lot of top-level if-else statements. The Denoiser::create can choose which implementation to choose based on the device selection.

One thing I am did not have time to fully understand yet is the peak memory comparison. It seems that now when pre-filter is used memory for albedo and normal is always allocated, while previously it was allowed to modify data in-place. For example, when rendering happens on multiple devices and denoiser is used a temporary RenderBuffers is created with the result gathered from all the devices, denoised, and copied back to individual devices. It is only the denoised passes which are copied, which makes it possible to modify guiding passes in-place, without extra full-frame buffer allocated. This is something that should be supported by the new denoiser implementation, to keep memory usage under control as much as possible.

The CPU/GPU unification is very interesting story! Sounds like there are ways to avoid separate CPU and GPU implementations for OIDN, so that would need to happen. Thing I am not sure is whether it's better to do such unification from the get-go, or allow some time of both implementations to co-exist for the cross-referencing purposes. Think I feel that allowing some time to co-exist is a better and more achievable step. But I am open to be convinced otherwise :)

For the CPU queue I am not really sure is the best approach. If the queue is only needed to schedule kernels we can take care of that by abstracting the call/synchronization. We can make functions like denoise_filter_color_preprocess virtual and let the OIDN denoiser implementation to take care of whether using queue or a direct kernel call. Somehow it seems like a better idea than opening a topic of introduction of CPU queue.

Is good to see an OIDN denoiser which follows the typical flow of the GPU denoiser so closely! Makes it clear to see the difference. The small comment I have is the naming. Can't say the `OIDN2Denoiser` is the clearest: the existing/CPU denoiser uses the same library. This is why I was proposing to use `OIDNDenoiserGPU` to kind of put it into a scope of flow-based naming. As in, hey, this denoiser implements the GPU denoiser flow. Maybe there are better names. Or, maybe, if we can unify the CPU and GPU sides it will simply a `OIDNDenoiser`, as it is now. On a user level I don't think we should have separate options for CPU and GPU OIDN: "just" choose the best fitting device. Similar to the OptiX denoiser which uses the same device for denoising as for path tracing whenever possible. We can even decide to choose available GPU device for denoising even when path tracing happens on a CPU. If a more granular control is needed, it is better to be implemented as an explicit device selection for denoising, which then can be used for both OIDN and OptiX. But such option is not something I'd start with. When I was mentioning separation of CPU and GPU OIDN denoiser I mainly meant the code-side separation, as it is quite tricky to follow code which has a lot of top-level if-else statements. The `Denoiser::create` can choose which implementation to choose based on the device selection. One thing I am did not have time to fully understand yet is the peak memory comparison. It seems that now when pre-filter is used memory for albedo and normal is always allocated, while previously it was allowed to modify data in-place. For example, when rendering happens on multiple devices and denoiser is used a temporary RenderBuffers is created with the result gathered from all the devices, denoised, and copied back to individual devices. It is only the denoised passes which are copied, which makes it possible to modify guiding passes in-place, without extra full-frame buffer allocated. This is something that should be supported by the new denoiser implementation, to keep memory usage under control as much as possible. The CPU/GPU unification is very interesting story! Sounds like there are ways to avoid separate CPU and GPU implementations for OIDN, so that would need to happen. Thing I am not sure is whether it's better to do such unification from the get-go, or allow some time of both implementations to co-exist for the cross-referencing purposes. Think I feel that allowing some time to co-exist is a better and more achievable step. But I am open to be convinced otherwise :) For the CPU queue I am not really sure is the best approach. If the queue is only needed to schedule kernels we can take care of that by abstracting the call/synchronization. We can make functions like `denoise_filter_color_preprocess` virtual and let the OIDN denoiser implementation to take care of whether using queue or a direct kernel call. Somehow it seems like a better idea than opening a topic of introduction of CPU queue.

Brecht Van Lommel commented

2023-09-15 17:24:45 +02:00

I think some user control over using CPU or GPU denoising would be helpful.

The safe would be to use GPU denoising only for GPU rendering by default, maybe also for CPU viewport rendering it's fine. But final and especially background CPU renders should probably not use the GPU by default, and be something that a user has to enable. Because you wouldn't want to automatically use whatever GPU may exist on a render farm node, when it may be a simple integrated or low memory CPU not meant for that purpose.

Perhaps this could be configured in the preferences rather than the scene?

Even better would be a way to configure in the preferences which GPU (if any) to use for denoising.

I think some user control over using CPU or GPU denoising would be helpful. The safe would be to use GPU denoising only for GPU rendering by default, maybe also for CPU viewport rendering it's fine. But final and especially background CPU renders should probably not use the GPU by default, and be something that a user has to enable. Because you wouldn't want to automatically use whatever GPU may exist on a render farm node, when it may be a simple integrated or low memory CPU not meant for that purpose. Perhaps this could be configured in the preferences rather than the scene? Even better would be a way to configure in the preferences which GPU (if any) to use for denoising.

Sergey Sharybin commented

2023-09-15 17:36:32 +02:00

That is all valid points, Brecht. I do agree such control is needed. Is just to me implementing it as a user preference or as a denoiser option in scene seems more flexible, more future proof, and reusable for OptiX than implementing such functionality via denoiser type.

I am not sure how OIDN implements device selection. Implementing the device used for OptiX denoiser seems straightforward: add a code to the find_best_denoiser_device_info which will return DeviceInfo for the selected in user preferences device.

One thing which would be nice to agree on is the breakdown in terms what gets implemented when.
The way i see it is:

Initial OIDN2 implementation follows simple rule: when rendering on oneAPI use oneAPI denoiser, otherwise use CPU denoiser. I think all OIDN capable devices are powerful enough to run denoiser.
Add a device selection
Unify code paths for CPU and GPU OIDN implementations.

How does it sound to you?

That is all valid points, Brecht. I do agree such control is needed. Is just to me implementing it as a user preference or as a denoiser option in scene seems more flexible, more future proof, and reusable for OptiX than implementing such functionality via denoiser type. I am not sure how OIDN implements device selection. Implementing the device used for OptiX denoiser seems straightforward: add a code to the `find_best_denoiser_device_info` which will return `DeviceInfo` for the selected in user preferences device. One thing which would be nice to agree on is the breakdown in terms what gets implemented when. The way i see it is: - Initial OIDN2 implementation follows simple rule: when rendering on oneAPI use oneAPI denoiser, otherwise use CPU denoiser. I think all OIDN capable devices are powerful enough to run denoiser. - Add a device selection - Unify code paths for CPU and GPU OIDN implementations. How does it sound to you?

Brecht Van Lommel commented

2023-09-15 17:38:43 +02:00

Sounds fine, it's good to keep this PR simple and then add CPU render + GPU denoising support as a second step with a proper UI.

Stefan Werner commented

2023-09-15 23:01:44 +02:00

All currently supported (and planned) oneAPI GPU that can run Cycles can also run OIDN2. However, a user may want to render on the GPU and denoise on the CPU if memory is tight - I've seen a few scenes where that made a difference between rendering and out of memory errors. Likewise, someone with a scene too big for the GPU may still want to use GPU viewport denoising for performance reasons.

As a side note, denoising on a oneAPI GPU while rendering on an Nvidia GPU worked for me too. Since the reverse scenario, render on oneAPI and denoise in OptiX, was already working, this was not unexpected, still something I was happy to see.

@Sergey I'll take a look at memory consumption, if there are any excess allocations that can be skipped.

All currently supported (and planned) oneAPI GPU that can run Cycles can also run OIDN2. However, a user may want to render on the GPU and denoise on the CPU if memory is tight - I've seen a few scenes where that made a difference between rendering and out of memory errors. Likewise, someone with a scene too big for the GPU may still want to use GPU viewport denoising for performance reasons. As a side note, denoising on a oneAPI GPU while rendering on an Nvidia GPU worked for me too. Since the reverse scenario, render on oneAPI and denoise in OptiX, was already working, this was not unexpected, still something I was happy to see. @Sergey I'll take a look at memory consumption, if there are any excess allocations that can be skipped.

Sergey Sharybin commented

2023-09-18 21:38:08 +02:00

If someone has multiple GPUs, especially if they are not equal in performance it does not feel enough to have just CPU and GPU separation, and a more granular control seems to be a good idea. Having it could also benefit OptiX denoiser.

michael campbell commented

2023-09-19 09:00:13 +02:00

First-time contributor

Is it OK if I make a suggestion as a user?

To keep it as simple as possible, In the preferences have checkboxes for:

denoising devices (same as cycles render devices). Allow user to exclude slow devices. If any GPU selected then assume user always wants to favour GPU denoising.

if CPU and GPU are ticked in the denoising devices list, always favour GPU unless multiple compositor denoise nodes, in which case assign one GPU to each node, and one node to the CPU. When no nodes remaining, have the GPU restart the CPU's node if time remaining on it > GPU's average denoise time per completed node.

Beyond that I think everything should be automatic based on a quick calculation prior to execution to ensure only eligible GPU's are used based on them having sufficient VRAM to complete the task (for example if VRAM can't be emptied before denoising/moving onto the next frame due to persistent data being enabled (unless it could be briefly moved to system ram during denoising?))

If no eligible GPUs are found, then revert to CPU (even if not chosen as a denoise device)

I think those options and logic will keep it as simple as possible for the user, whilst ensuring best possible performance, without the user having to mess about trying to calculate if their GPU has sufficient resources for the resolution they wish to denoise (which is tricky because the user often doesn't have a clue how much VRAM is in use/available when cycles finishes sampling and denoising/compositor execution begins).

Is it OK if I make a suggestion as a user? To keep it as simple as possible, In the preferences have checkboxes for: - denoising devices (same as cycles render devices). Allow user to exclude slow devices. If any GPU selected then assume user always wants to favour GPU denoising. if CPU and GPU are ticked in the denoising devices list, always favour GPU unless multiple compositor denoise nodes, in which case assign one GPU to each node, and one node to the CPU. When no nodes remaining, have the GPU restart the CPU's node if time remaining on it > GPU's average denoise time per completed node. Beyond that I think everything should be automatic based on a quick calculation prior to execution to ensure only eligible GPU's are used based on them having sufficient VRAM to complete the task (for example if VRAM can't be emptied before denoising/moving onto the next frame due to persistent data being enabled (unless it could be briefly moved to system ram during denoising?)) If no eligible GPUs are found, then revert to CPU (even if not chosen as a denoise device) I think those options and logic will keep it as simple as possible for the user, whilst ensuring best possible performance, without the user having to mess about trying to calculate if their GPU has sufficient resources for the resolution they wish to denoise (which is tricky because the user often doesn't have a clue how much VRAM is in use/available when cycles finishes sampling and denoising/compositor execution begins).

Xavier Hallade modified the milestone from 4.0 to 4.1

2023-09-27 13:18:19 +02:00

Brecht Van Lommel commented

2023-10-09 19:12:21 +02:00

All currently supported (and planned) oneAPI GPU that can run Cycles can also run OIDN2. However, a user may want to render on the GPU and denoise on the CPU if memory is tight - I've seen a few scenes where that made a difference between rendering and out of memory errors.

If we can detect device out of memory and then fall back to the CPU that would be a nice way to solve this problem. I don't know how practical it is to either estimate this or get good error messages from OIDN?

Likewise, someone with a scene too big for the GPU may still want to use GPU viewport denoising for performance reasons.

Agreed, I think for the viewport we could just use the GPU by default if it is supported, since it's already using the GPU for drawing the rest of the UI and viewport. It's mainly in final and background renders with potentially very high resolutions or on render farms where we don't want to automatically use the GPU for CPU renders.

> All currently supported (and planned) oneAPI GPU that can run Cycles can also run OIDN2. However, a user may want to render on the GPU and denoise on the CPU if memory is tight - I've seen a few scenes where that made a difference between rendering and out of memory errors. If we can detect device out of memory and then fall back to the CPU that would be a nice way to solve this problem. I don't know how practical it is to either estimate this or get good error messages from OIDN? > Likewise, someone with a scene too big for the GPU may still want to use GPU viewport denoising for performance reasons. Agreed, I think for the viewport we could just use the GPU by default if it is supported, since it's already using the GPU for drawing the rest of the UI and viewport. It's mainly in final and background renders with potentially very high resolutions or on render farms where we don't want to automatically use the GPU for CPU renders.

Stefan Werner added 1 commit 2023-10-10 14:15:42 +02:00

534eba4d95 Merge branch 'main' into oidn2

Stefan Werner added 1 commit 2023-11-08 10:52:51 +01:00

3fc24077a6 Merge branch 'main' into oidn2

Stefan Werner added 2 commits 2023-11-17 10:59:25 +01:00

a4ccc94bad Renamed OIDN2Denoiser to OIDNDenoiserGPU.

4233d6f1c3 Cycles: Automatic selection of OIDN on GPU or CPU

The manual selection of OIDN on GPU or CPU has been removed,
denoising now happens on the same device that was used for
rendering.

Stefan Werner added 1 commit 2023-11-17 11:07:18 +01:00

eb600035d4 Merge branch 'main' into oidn2

Stefan Werner added 1 commit 2023-11-17 12:13:34 +01:00

fc23fd0e07 Cycles: Removed redundant buffers in OIDN GPU

The albedo/normal buffers were prefiltered in separate memory,
where it can easily be done in place without side effect.

Stefan Werner commented

2023-11-17 12:28:05 +01:00

@Sergey @brecht When you get the time, can you take another look please?

Device selection now follows the selected render device, with one exception - when "automatic" is selected on a machine that can use both the OptiX and the OIDN GPU denoiser, OptiX will be preferred regardless of the selected render device. This is because the automatic selection looks only at available devices, not selected devices:
https://projects.blender.org/blender/blender/src/branch/main/intern/cycles/blender/sync.cpp#L997

Sergey, you were right about the duplicated buffers for normal and albedo prefiltering. This is now fixed.

@Sergey @brecht When you get the time, can you take another look please? Device selection now follows the selected render device, with one exception - when "automatic" is selected on a machine that can use both the OptiX and the OIDN GPU denoiser, OptiX will be preferred regardless of the selected render device. This is because the automatic selection looks only at available devices, not selected devices: https://projects.blender.org/blender/blender/src/branch/main/intern/cycles/blender/sync.cpp#L997 Sergey, you were right about the duplicated buffers for normal and albedo prefiltering. This is now fixed.

Stefan Werner added 1 commit 2023-11-17 13:05:47 +01:00

fd24459a2d Cleanup: Removed unused function and file in Cycles.

Brecht Van Lommel referenced this pull request

2023-11-17 14:49:26 +01:00

OpenImageDenoise GPU acceleration #115045

Brecht Van Lommel requested changes 2023-11-17 14:51:05 +01:00

Brecht Van Lommel left a comment

This is looking good as the first step now. Just one minor comment.

I created #115045 to track the remaining tasks.

OIDN 2.1 libraries are committed for Linux and macOS, for Windows they are missing still.

This is looking good as the first step now. Just one minor comment. I created #115045 to track the remaining tasks. OIDN 2.1 libraries are committed for Linux and macOS, for Windows they are missing still.

intern/cycles/device/denoise.cpp Outdated

						
				@ -13,3 +13,3 @@

				      return "OptiX";

				    case DENOISER_OPENIMAGEDENOISE:

				      return "OpenImageDenoise";

				      return "OpenImageDenoise (CPU)";

Brecht Van Lommel commented

2023-11-17 14:28:09 +01:00

Remove (CPU) here

Remove ` (CPU)` here

Stefan_Werner marked this conversation as resolved

Stefan Werner added 1 commit 2023-11-17 15:20:08 +01:00

68f112df40 Cleanup: Removed extraneous "(CPU)" from denoiser name

Stefan Werner commented

2023-11-17 15:25:54 +01:00

Ok, I'll wait for the Windows binaries then.

Brecht Van Lommel approved these changes 2023-11-17 15:33:53 +01:00

Ray molenkamp commented

2023-11-17 15:48:56 +01:00

To prevent having to update it twice, I was planning on updating oidn lib with the rest of the 4.1 lib update which is probably weeks if not months out, given i'm seemingly holding up something, I'll prioritize them

❤️ 1

Brecht Van Lommel commented

2023-11-17 15:50:33 +01:00

Thanks!