I'm aware of these GPUs, and I tried to look into this a while ago but I couldn't find any mention of checking for tensor core support in the CUDA documentation. I also couldn't find any mention…
I managed to implement a minimal CUDART on top of the driver API, and so far it works great. We could replace the proprietary CUDART with this shim starting with the next OIDN release. So this…
A minimal open source CUDART shim seems like the most promising alternative to me. There are only a few functions that need to be implemented for OIDN, and this doesn't seem too complicated. I…
What's the deadline for enabling HIP, CUDA (if possible) and Metal support (will ship soon in OIDN 2.2) in Blender 4.1? Do these need to be added in Bcon1 or Bcon2 is fine too?
Sure! I agree that such a comment should be added.
A possibly more precise terminology would be "implicit linking" but it seems there's not a lot of consensus on this. dlopen()
is platform-specific.
The bottom line is that it seems we could…
So this means that HIP support should be fine since it doesn't require shipping any runtime/non-free dependency?
What about the HIP runtime? That is shipped with the AMD drivers. Is it OK to link against that DLL/SO statically?
Does the fact that OIDN loads its CUDA and HIP backends with dlopen() make any…
I agree that ideally we should use the driver API but it's not really up to us. CUTLASS uses the runtime, so we would have to fork it and modify it somehow, which could be a lot of extra work.
I don't think so. "Host memory" in a GPU compute context often means host pinned memory which is always supported. That's not what this flag indicates. CUDA calls it "system allocated memory",…