OpenImageDenoise GPU acceleration #115045

New Issue

Brecht Van Lommel · 2023-11-17T14:49:25+01:00

Brecht Van Lommel commented

2023-11-17 14:49:25 +01:00

Tasks for 4.1

Upgrade to OIDN 2.1 (#112143)
- Linux
- macOS
- Windows (#113157)
OneAPI support (#108314)
Use GPU for denoising of viewport and final GPU renders
OIDN library support for
- oneAPI
- CUDA
- HIP
- Metal
Cycles support for
- oneAPI
- CUDA (#115828) (d16d2bbd3a)
- HIP (#115854) (fd8bb41224)
- Metal (#116124) (31d55e87f9)
Add to option to force CPU denoising on a scene by scene basis (#117874, #117876) (bc886857f3)
Switch OptiX automatic viewport denoising from OptiX to OIDN
Use cuDevicePrimaryCtxRetain for improved CUDA and HIP performance

Future Tasks

Automatic GPU to CPU fallback on out of memory error
Compositor node support (#115242)
- Full frame
- GPU
Use GPU for viewport denoising of CPU renders (maybe, but might not be worth the complexity)
~~Denoising device selection in preferences~~ (not needed, just using same devices seems ok)
~~Add option to force CPU denoising in preferences~~ (not needed, already on scene) (#117734)

When to use the GPU

We want to use the GPU as much as possible for performance, but there are two situations where we do not.

When using a desktop or render farm designed for CPU usage only. This should be controlled by the preferences. If no GPU is selected for rendering, then denoising could just not use any GPU either. It's unclear if it's worth adding an option to use the GPU for denoising only, keeping the preferences simple seems preferable.

When there is not sufficient GPU memory available, we want to use CPU denoising instead. This varies from scene to scene. Ideally we can catch OIDN out of memory errors, and automatically fall back to CPU rendering. Still we may also want an option to manually force it to use the CPU. This option could be placed in the "Memory > Performance" panel along with the tiling settings.

## Tasks for 4.1 * [x] Upgrade to OIDN 2.1 (#112143) * [x] Linux * [x] macOS * [x] Windows (#113157) * [x] OneAPI support (#108314) * [x] Use GPU for denoising of viewport and final GPU renders * [x] OIDN library support for * [x] oneAPI * [x] CUDA * [x] HIP * [x] Metal * [x] Cycles support for * [x] oneAPI * [x] CUDA (#115828) (d16d2bbd3a4a6f3b0fc3ea2b795f787646719241) * [ ] HIP (#115854) (fd8bb41224a2ce0fd9c018eaffc943485ea5955e) * [x] Metal (#116124) (31d55e87f9a301b6762697fa7b93cefd8ffc28e9) * [x] Add to option to force CPU denoising on a scene by scene basis (#117874, #117876) (bc886857f3) * [x] Switch OptiX automatic viewport denoising from OptiX to OIDN * [x] Use `cuDevicePrimaryCtxRetain` for improved CUDA and HIP performance ## Future Tasks * [ ] Automatic GPU to CPU fallback on out of memory error * [ ] Compositor node support (#115242) * [ ] Full frame * [ ] GPU * [ ] Use GPU for viewport denoising of CPU renders (maybe, but might not be worth the complexity) * ~~Denoising device selection in preferences~~ (not needed, just using same devices seems ok) * ~~Add option to force CPU denoising in preferences~~ (not needed, already on scene) (#117734) ## When to use the GPU We want to use the GPU as much as possible for performance, but there are two situations where we do not. When using a desktop or render farm designed for CPU usage only. This should be controlled by the preferences. If no GPU is selected for rendering, then denoising could just not use any GPU either. It's unclear if it's worth adding an option to use the GPU for denoising only, keeping the preferences simple seems preferable. When there is not sufficient GPU memory available, we want to use CPU denoising instead. This varies from scene to scene. Ideally we can catch OIDN out of memory errors, and automatically fall back to CPU rendering. Still we may also want an option to manually force it to use the CPU. This option could be placed in the "Memory > Performance" panel along with the tiling settings.

❤️ 9 🚀 4

Brecht Van Lommel added this to the 4.0 milestone 2023-11-17 14:49:25 +01:00

Brecht Van Lommel added the

labels 2023-11-17 14:49:25 +01:00

Brecht Van Lommel added this to the Render & Cycles project 2023-11-17 14:49:27 +01:00

Brecht Van Lommel referenced this issue

2023-11-17 14:49:49 +01:00

Cycles: Add Intel GPU support for OpenImageDenoise #108314

Stefan Werner referenced this issue from a commit

2023-11-20 11:12:52 +01:00

Cycles: Add Intel GPU support for OpenImageDenoise

Xavier Hallade modified the milestone from 4.0 to 4.1

2023-11-20 17:44:02 +01:00

Brecht Van Lommel referenced this issue

2023-11-22 18:51:03 +01:00

Compositor: Enable GPU denoising for OpenImageDenoise #115242

Brecht Van Lommel referenced this issue

2023-11-22 18:51:10 +01:00

Cycles: Added quality parameter for OIDN #115265

Stefan Werner commented

2023-11-23 12:08:53 +01:00

Before I start working on CUDA support, I wanted to check with regards to licensing. OIDN and one of its dependencies, cutlass (https://github.com/NVIDIA/cutlass), both are written against the CUDA runtime API, not the driver API (cuew). That requires linking against one of the binaries included in the CUDA SDK. Would the Blender Foundation consider the libraries in the CUDA SDK as incompatible with the GPL or are they viewed as "system libraries"?

Stefan Werner referenced this issue from a commit

2023-11-23 12:35:39 +01:00

Cycles: Added quality parameter for OIDN

Brecht Van Lommel commented

2023-11-24 19:47:06 +01:00

I will check, it's not obvious to me. Ideally we could somehow use the driver API.

Attila Áfra commented

2023-11-24 20:31:13 +01:00

I agree that ideally we should use the driver API but it's not really up to us. CUTLASS uses the runtime, so we would have to fork it and modify it somehow, which could be a lot of extra work.

Attila Áfra commented

2023-11-24 20:33:59 +01:00

What about the HIP runtime? That is shipped with the AMD drivers. Is it OK to link against that DLL/SO statically?

Does the fact that OIDN loads its CUDA and HIP backends with dlopen() make any difference from a licensing perspective?

What about the HIP runtime? That is shipped with the AMD drivers. Is it OK to link against that DLL/SO statically? Does the fact that OIDN loads its CUDA and HIP backends with dlopen() make any difference from a licensing perspective?

Brecht Van Lommel commented

2023-11-24 20:47:31 +01:00

The potential issue is with distributing GPL and proprietary cudart binaries in a single package. Not so much static vs. dynamic library linking.

Attila Áfra commented

2023-11-24 20:51:43 +01:00

So this means that HIP support should be fine since it doesn't require shipping any runtime/non-free dependency?

Brecht Van Lommel commented

2023-11-24 21:01:56 +01:00

If the required HIP runtime is shipped with the AMD drivers it's fine. But to be clear, I would not call that linking "statically". But rather dynamic linking without manual dlopen().

If the required HIP runtime is shipped with the AMD drivers it's fine. But to be clear, I would not call that linking "statically". But rather dynamic linking without manual `dlopen()`.

Attila Áfra commented

2023-11-24 21:31:47 +01:00

A possibly more precise terminology would be "implicit linking" but it seems there's not a lot of consensus on this. dlopen() is platform-specific.

The bottom line is that it seems we could move forward with HIP support. Please let us know whether distributing CUDART would be acceptable.

A possibly more precise terminology would be "implicit linking" but it seems there's not a lot of consensus on this. `dlopen()` is platform-specific. The bottom line is that it seems we could move forward with HIP support. Please let us know whether distributing CUDART would be acceptable.

Attila Áfra commented

2023-11-27 16:38:09 +01:00

What's the deadline for enabling HIP, CUDA (if possible) and Metal support (will ship soon in OIDN 2.2) in Blender 4.1? Do these need to be added in Bcon1 or Bcon2 is fine too?

Brecht Van Lommel commented

2023-11-27 16:53:22 +01:00

Bcon1 is preferable, but Bcon2 is possible if needed.

👍 1

Brecht Van Lommel commented

2023-11-27 16:53:44 +01:00

Also, great to hear Metal is coming!

👍 1

Brecht Van Lommel commented

2023-11-27 19:00:40 +01:00

For the cudart license, it's not looking good. We most likely won't be able to distribute it.

What could the alternative be?

Is it at all practical to port just the required part of cutlass to the driver API? It looks like there's a lot of code here, difficult to get an overview of how many of the ~350k lines of cutlass code is needed by OIDN.
Maybe there is a way to use the composable_kernel HIP code for CUDA. Though this would have suboptimal performance and doesn't look easy either, would need to ship a HIP runtime for CUDA somehow?
Maybe there is another good library for GEMM that works with the driver API, but I couldn't immediately fine one.
Maybe we could ship an open source cudart shim that implements just the basics of what we need. But this also seems complicated.

Nothing seeing any straightforward solutions so far. CC @pmoursnv.

For the cudart license, it's not looking good. We most likely won't be able to distribute it. What could the alternative be? * Is it at all practical to port just the required part of cutlass to the driver API? It looks like there's a lot of code here, difficult to get an overview of how many of the ~350k lines of cutlass code is needed by OIDN. * Maybe there is a way to use the composable_kernel HIP code for CUDA. Though this would have suboptimal performance and doesn't look easy either, would need to ship a HIP runtime for CUDA somehow? * Maybe there is another good library for GEMM that works with the driver API, but I couldn't immediately fine one. * Maybe we could ship an open source cudart shim that implements just the basics of what we need. But this also seems complicated. Nothing seeing any straightforward solutions so far. CC @pmoursnv.

Attila Áfra commented

2023-11-28 01:44:19 +01:00

A minimal open source CUDART shim seems like the most promising alternative to me. There are only a few functions that need to be implemented for OIDN, and this doesn't seem too complicated. I would much prefer this over modifying CUTLASS and maintaining a fork of it or switching to some less performant library. I'll look into this approach in more detail to see whether it's indeed a viable solution.

Attila Áfra commented

2023-12-12 17:08:09 +01:00

I managed to implement a minimal CUDART on top of the driver API, and so far it works great. We could replace the proprietary CUDART with this shim starting with the next OIDN release. So this means CUDA support could be also enabled in Blender 4.1, right?

There's a minor issue in Cycles though, regarding how a CUDA context is created: fd3629b80a/intern/cycles/device/cuda/device_impl.cpp (L107)

It's not typically recommended to create custom CUDA contexts this way because other CUDA components like OIDN will use a separate context for the same device, causing performance issues. The CUDA documentation recommends to use the primary context instead, using cuDevicePrimaryCtxRetain. This way both Cycles and OIDN would use the primary context. Could you switch to this?

I managed to implement a minimal CUDART on top of the driver API, and so far it works great. We could replace the proprietary CUDART with this shim starting with the next OIDN release. So this means CUDA support could be also enabled in Blender 4.1, right? There's a minor issue in Cycles though, regarding how a CUDA context is created: https://projects.blender.org/blender/blender/src/commit/fd3629b80a1c205344627f872323714065bf0fca/intern/cycles/device/cuda/device_impl.cpp#L107 It's not typically recommended to create custom CUDA contexts this way because other CUDA components like OIDN will use a separate context for the same device, causing performance issues. The CUDA documentation recommends to use the primary context instead, using cuDevicePrimaryCtxRetain. This way both Cycles and OIDN would use the primary context. Could you switch to this?

🎉 7

Brecht Van Lommel commented

2023-12-12 18:33:30 +01:00

@aafra many thanks for solving CUDART problem.

I will look into using cuDevicePrimaryCtxRetain. There can be multiple renders instances running at the same time, like a 3D viewport render and a small material preview render. So I will need to check what the effect on that is, if we can do everything safely in one primary context.

@aafra many thanks for solving CUDART problem. I will look into using `cuDevicePrimaryCtxRetain`. There can be multiple renders instances running at the same time, like a 3D viewport render and a small material preview render. So I will need to check what the effect on that is, if we can do everything safely in one primary context.

👍 3

Brecht Van Lommel commented

2024-01-26 13:53:49 +01:00

@aafra @Stefan_Werner it was pointed out that the GTX 1650 and 1660 cards have compute capability 7.5, but no tensor cores. As far as I can tell OIDN only checks compute capability. So I'm wondering if that is correct, or if perhaps these GPUs should be blacklisted?
https://devtalk.blender.org/t/2024-01-23-render-cycles-meeting/33059/4

@aafra @Stefan_Werner it was pointed out that the GTX 1650 and 1660 cards have compute capability 7.5, but no tensor cores. As far as I can tell OIDN only checks compute capability. So I'm wondering if that is correct, or if perhaps these GPUs should be blacklisted? https://devtalk.blender.org/t/2024-01-23-render-cycles-meeting/33059/4

Attila Áfra commented

2024-01-27 14:19:41 +01:00

I'm aware of these GPUs, and I tried to look into this a while ago but I couldn't find any mention of checking for tensor core support in the CUDA documentation. I also couldn't find any mention in the docs that some CC 7.5 or later GPUs may not support tensor instructions. We're using tensor cores through CUTLASS, and CUTLASS reports that it's supported on Volta and later, so it's not really up to OIDN to properly handle these special cases. We claim support for whatever CUTLASS claims to support. I didn't find any special checks in CUTLASS either. So there seems to be some disconnect between marketing specs and CUDA docs. My best guess is that maybe these cards do support tensor instructions (i.e. do not crash, etc.) but are emulated. This seems to be confirmed by the following comment from an NVIDIA employee: https://stackoverflow.com/questions/66621753/how-am-i-able-to-run-tensor-core-instructions-without-actually-having-tensor-cor

Ray molenkamp commented

2024-01-27 17:28:52 +01:00

I have a GTX 1660 to test on, I'll test it when we actually do the lib update.

Brecht Van Lommel commented

2024-01-27 18:34:04 +01:00

Thanks both, sounds good.

If it fails, we can always blacklist it on the Cycles side too.

Thanks both, sounds good. If it fails, we can always blacklist it on the Cycles side too.

Attila Áfra commented

2024-01-27 18:37:10 +01:00

@LazyDodo There's an easy way to test whether OIDN works on GTX 1660. Could you please download the latest OIDN binaries (https://github.com/OpenImageDenoise/oidn/releases/tag/v2.1.0) and run oidnBenchmark -d cuda -v 1?

@LazyDodo There's an easy way to test whether OIDN works on GTX 1660. Could you please download the latest OIDN binaries (https://github.com/OpenImageDenoise/oidn/releases/tag/v2.1.0) and run `oidnBenchmark -d cuda -v 1`?

Ray molenkamp commented

2024-01-27 21:09:16 +01:00

e:\oidn-2.1.0.x64.windows\bin>oidnBenchmark -d cuda -v 1

Intel(R) Open Image Denoise 2.1.0
  Compiler  : Clang 18.0.0 (https://github.com/intel/llvm.git 3b0a1db918cb956a6cf32a6ab506627c2c6f5613)
  Build     : Release
  OS        : Windows (64-bit)
  Device    : NVIDIA GeForce GTX 1660
    Type    : CUDA
    Arch    : SM 7.5
    SMs     : 22

RT.hdr_alb_nrm.1920x1080 ... 711.569 msec/image
RT.ldr_alb_nrm.1920x1080 ... 708.102 msec/image
RT.hdr_alb_nrm.3840x2160 ... 2944.06 msec/image
RT.ldr_alb_nrm.3840x2160 ... 2968.26 msec/image
RT.hdr_alb_nrm.1280x720 ... 316.383 msec/image
RT.ldr_alb_nrm.1280x720 ... 317.229 msec/image
RTLightmap.hdr.2048x2048 ... 1429.36 msec/image
RTLightmap.hdr.4096x4096 ... 6294.4 msec/image
RTLightmap.hdr.1024x1024 ... 362.669 msec/image

e:\oidn-2.1.0.x64.windows\bin>oidnTest.exe --device cuda -v normal

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
oidnTest.exe is a Catch v2.13.6 host application.
Run with -? for options

-------------------------------------------------------------------------------
single filter
  single filter: 1 frame
-------------------------------------------------------------------------------
C:/GA/intel/001/_work/_temp/w/apps/oidnTest.cpp(398)
...............................................................................

C:/GA/intel/001/_work/_temp/w/apps/oidnTest.cpp(401): FAILED:
  REQUIRE( device.getError() == Error::None )
with expansion:
  4 == 0

===============================================================================
test cases:   13 |   12 passed | 1 failed
assertions: 9797 | 9796 passed | 1 failed

``` e:\oidn-2.1.0.x64.windows\bin>oidnBenchmark -d cuda -v 1 Intel(R) Open Image Denoise 2.1.0 Compiler : Clang 18.0.0 (https://github.com/intel/llvm.git 3b0a1db918cb956a6cf32a6ab506627c2c6f5613) Build : Release OS : Windows (64-bit) Device : NVIDIA GeForce GTX 1660 Type : CUDA Arch : SM 7.5 SMs : 22 RT.hdr_alb_nrm.1920x1080 ... 711.569 msec/image RT.ldr_alb_nrm.1920x1080 ... 708.102 msec/image RT.hdr_alb_nrm.3840x2160 ... 2944.06 msec/image RT.ldr_alb_nrm.3840x2160 ... 2968.26 msec/image RT.hdr_alb_nrm.1280x720 ... 316.383 msec/image RT.ldr_alb_nrm.1280x720 ... 317.229 msec/image RTLightmap.hdr.2048x2048 ... 1429.36 msec/image RTLightmap.hdr.4096x4096 ... 6294.4 msec/image RTLightmap.hdr.1024x1024 ... 362.669 msec/image ``` ``` e:\oidn-2.1.0.x64.windows\bin>oidnTest.exe --device cuda -v normal ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ oidnTest.exe is a Catch v2.13.6 host application. Run with -? for options ------------------------------------------------------------------------------- single filter single filter: 1 frame ------------------------------------------------------------------------------- C:/GA/intel/001/_work/_temp/w/apps/oidnTest.cpp(398) ............................................................................... C:/GA/intel/001/_work/_temp/w/apps/oidnTest.cpp(401): FAILED: REQUIRE( device.getError() == Error::None ) with expansion: 4 == 0 =============================================================================== test cases: 13 | 12 passed | 1 failed assertions: 9797 | 9796 passed | 1 failed ```

Attila Áfra commented

2024-01-27 21:43:47 +01:00

Thanks! It looks good except that failed test but it's an out of memory error (probably because the GPU has less memory than what we typically use for testing), so I think it's fine.

Attila Áfra commented

2024-02-02 02:26:32 +01:00

OIDN 2.2 release candidate is out: 9db3b38d50

This includes support for Meteor Lake, Metal, and ARM64 on Windows/Linux too, and it switched to the CUDA driver API. Please let me know if you encounter any issues.

OIDN 2.2 release candidate is out: https://github.com/OpenImageDenoise/oidn/tree/9db3b38d50c86fb4585ac54a6dbf87ef6399ffc2 This includes support for Meteor Lake, Metal, and ARM64 on Windows/Linux too, and it switched to the CUDA driver API. Please let me know if you encounter any issues.

🎉 5

Xavier Hallade referenced this issue

2024-02-02 13:31:49 +01:00

Build: upgrade OpenImageDenoise to 2.2 #117752

Nikita Sirgienko referenced this issue from a commit

2024-02-06 20:58:57 +01:00

Cycles: Set quality parameter for OIDN

Alaska commented

2024-02-09 05:03:42 +01:00

Switch OptiX automatic viewport denoising from OptiX to OIDN

Automatic GPU to CPU fallback on out of memory error

These two tasks have been marked as completed. But I can't find commits for either of them. And looking at the code, the OptiX denoiser is still being used by default. I haven't checked on the automatic fallback. Am I missing something here?

> * [x] Switch OptiX automatic viewport denoising from OptiX to OIDN > * [x] Automatic GPU to CPU fallback on out of memory error These two tasks have been marked as completed. But I can't find commits for either of them. And looking at the code, the OptiX denoiser is still being used by default. I haven't checked on the automatic fallback. Am I missing something here?

Brecht Van Lommel commented

2024-02-09 16:49:57 +01:00

Indeed, they should not have been marked resolved yet.

I will take care of the NVIDIA default denoiser.

Probably too late for automatic fallback to CPU, though at least the code is trying to reduce GPU memory usage when allocation fails.

Indeed, they should not have been marked resolved yet. I will take care of the NVIDIA default denoiser. Probably too late for automatic fallback to CPU, though at least the code is trying to reduce GPU memory usage when allocation fails.

JohnDow commented

2024-02-10 10:32:34 +01:00

@aafra hey, just wanted to let you know that I've tested OIDN (GPU) on my 1660GTX (CUDA) and it works great. Even a little bit faster than OptiX while providing much better quality (especially in scenes with a lot of emissive materials). Viewport performance is also great - same as OptiX or even better in some scenes.

PS. To clarify: In the viewport I set passes to 'Albedo and Normal', prefilter to 'Fast' or 'None' and start sample to 1.

@aafra hey, just wanted to let you know that I've tested OIDN (GPU) on my 1660GTX (CUDA) and it works great. Even a little bit faster than OptiX while providing much better quality (especially in scenes with a lot of emissive materials). Viewport performance is also great - same as OptiX or even better in some scenes. PS. To clarify: In the viewport I set passes to 'Albedo and Normal', prefilter to 'Fast' or 'None' and start sample to 1.

❤️ 2 🎉 1 🚀 1

Nika Kutsniashvili commented

2024-02-10 10:39:08 +01:00

I see no performance improvements in Viewport on 1650, maybe a slight regression too, and #118020.
Docs say 16XX is supported, but maybe its 166X?

I see no performance improvements in Viewport on 1650, maybe a slight regression too, and #118020. Docs say 16XX is supported, but maybe its 166X?

Attila Áfra commented

2024-02-13 18:44:09 +01:00

A GPU being supported doesn't necessarily mean that it would be faster than any CPU. In your case, the GPU doesn't have tensor cores, so it's emulated, which means that performance could be actually worse than for some CPUs.

Brecht Van Lommel removed this from the 4.1 milestone 2024-03-14 17:47:51 +01:00

hisanimations commented

2024-03-19 13:01:13 +01:00

GPU accelerated OIDN has been disabled in the 4.1 release for AMD GPUs, despite it being available in the beta. Is it possible for the feature to be enabled via the Experimental section in Blender's settings?

GPU accelerated OIDN has been disabled in the 4.1 release for AMD GPUs, despite it being available in the beta. Is it possible for the feature to be enabled via the `Experimental` section in Blender's settings?

Brecht Van Lommel commented

2024-03-19 13:50:12 +01:00

Not in 4.1. The way OIDN works currently, making it available at all risks crashes even when not using it.

Attila Áfra commented

2024-03-19 13:59:08 +01:00

@brecht Couldn't you perhaps add an environment variable (or use OIDN's already existing OIDN_DEVICE_HIP) which could enable OIDN on HIP? In most cases there shouldn't be a crash (it has been fixed for all known cases so far), and this way interested users could give it a try.

@brecht Couldn't you perhaps add an environment variable (or use OIDN's already existing `OIDN_DEVICE_HIP`) which could enable OIDN on HIP? In most cases there shouldn't be a crash (it has been fixed for all known cases so far), and this way interested users could give it a try.

hisanimations commented

2024-03-19 16:22:52 +01:00

Has all code bringing support for GPU OIDN for AMD GPUs been removed from Blender until 4.2? If someone wishes to use that feature, should they use the latest version of 4.1 beta instead?

Brecht Van Lommel referenced this issue

2024-03-19 18:14:07 +01:00

Cycles: Allow enabling OIDN for HIP with environment variable #119672

Brecht Van Lommel referenced this issue from a commit

2024-03-19 18:14:38 +01:00

Cycles: Allow enabling OIDN for HIP with environment variable

Brecht Van Lommel commented

2024-03-19 18:18:06 +01:00

I've made it possible now to set the OIDN_DEVICE_HIP environment variable for 4.1.

To use this, you then either have to use 4.1 with this environment variable, or 4.2 alpha without it.

I've made it possible now to set the `OIDN_DEVICE_HIP` environment variable for 4.1. To use this, you then either have to use 4.1 with this environment variable, or 4.2 alpha without it.

👍 3

Sign in to join this conversation.

No Label

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

OpenImageDenoise GPU acceleration #115045

Tasks for 4.1

Future Tasks

When to use the GPU