I'm confused now. This crash with the integrated AMD GPUs is specific to OIDN, right? So because of this you don't even want to load the HIP runtime at all for older driver versions, even though…
@brecht Why not the check the driver or runtime version with hipDriverGetVersion
or hipRuntimeGetVersion
? These are available on both Windows and Linux.
@brecht The main drawback of your solution is that if there is an AMD integrated GPU in the system, OIDN HIP support will be disabled for discrete AMD GPUs too, even if the driver version is up to…
@brecht We cannot miss any architecture because the list of possible HIP targets is well-known but I get your point. But this is the best I can do on OIDN side at the moment, especially at such…
@brecht Well, I could send you a patch, which would basically partially upgrade 2.2.1 to 2.2.2 but I'm not sure whether it would be really safer than just upgrading to 2.2.2 directly. I could…
Using an internal OIDN 2.2.2 build and the latest Blender 4.1 build I don't see any leaks with CUDA anymore. The GPU memory usage doesn't increase at all even after more than 2000 frames.
@brech…
Thanks! So it seems that the new driver fixed the issue, and the workaround in Blender is needed only for older drivers.
Thanks. Could you please try the previous Blender binaries which crashed with this new driver? It would be useful to know whether the driver fixed the issue, which Blender needs to work around in…
@MarcFreeDev Did you use the same version of Adrenalin when Blender crashed too?
@brecht I managed to fix the leak. I'm now preparing OIDN 2.2.2 with the fix, which should be out in one or two days. I'll double check with Blender as well to make sure that there are no…
I've identified a leak for CUDA, SYCL and HIP devices, but not for CPU and Metal. I'm working on a patch release to fix this in time for Blender 4.1.
But so far I have no idea what's happening…
@Alaska Did you see the Metal memory leak with latest OIDN 2.2.1? The Xcode leaks
tool doesn't detect any leaks so I'm very surprised about this.
I'm still investigating the source of the…
@MarcFreeDev Did you try to update the AMD iGPU driver? It would be important to know whether this issue in HIP is still present.
It is helpful to some extent because without this it would crash on even more machines. I’ll try to find some other workaround but this really should be fixed in HIP instead.
@brecht I'm aware of this issue, and I hoped that this got fixed (I also let Brian know about this a while ago) but unfortunately it seems it didn't. There's not much we can do about this on the…
@brecht OIDN is holding the context because it implements the CUDA runtime. So the context will be destroyed only when the application exits.
I could reproduce this second memory leak with OIDN…
@Alaska With the module leak fix do you actually run out of memory at some point? So far I couldn't reproduce this because the memory usage always 'resets' after a while, so there's always plenty…
I also noticed the behavior you described but I'm not entirely sure that it's indeed a real leak. In my testing I'm also seeing a small progressive increase in memory usage reported by nvidia-smi
…
Thanks, this is indeed important. So it seems there was not just one but two memory leaks with HIP. The module leak which has been now fixed and an unknown one which is triggered only when Open…