"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!" with nVidia GPU and AMD iGPU #119444

New Issue

Deniil Ekimov · 2024-03-13T21:34:52+01:00

Deniil Ekimov commented

2024-03-13 21:34:52 +01:00

System Information
Operating system: Windows 10 Pro 22H2 19045.4046
Graphics card: nVidia RTX 3060, AMD Radeon Graphics in Ryzen 7 5700G

Blender Version
Broken: 4.1 Release candidate 3e8ed795cb
Worked: (newest version of Blender that worked as expected) 4.0.2

Short description of error

On a system with nVidia GPU and AMD iGPU blender crashes when you switch to Cycles
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
HIP is not even set in Preferences -> System. Selecting CUDA, OptiX or None does not help.

Exact steps for others to reproduce the error
Based on the default startup file

Select Cycles renderer, you can leave Render Device set as CPU
Switch to Rendered preview
Blender crashes with error "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"

I wanted to attach the crash log, but AppData/Local/Temp is empty. Tried emptying temp and crashing again and still no files. It seems Blender is not creating the crash log

**System Information** Operating system: Windows 10 Pro 22H2 19045.4046 Graphics card: nVidia RTX 3060, AMD Radeon Graphics in Ryzen 7 5700G **Blender Version** Broken: 4.1 Release candidate 3e8ed795cb14 Worked: (newest version of Blender that worked as expected) 4.0.2 **Short description of error** On a system with nVidia GPU and AMD iGPU blender crashes when you switch to Cycles "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!" HIP is not even set in Preferences -> System. Selecting CUDA, OptiX or None does not help. **Exact steps for others to reproduce the error** Based on the default startup file 1. Select Cycles renderer, you can leave Render Device set as CPU 2. Switch to Rendered preview 3. Blender crashes with error "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!" I wanted to attach the crash log, but AppData/Local/Temp is empty. Tried emptying temp and crashing again and still no files. It seems Blender is not creating the crash log

system-info.txt

24 KiB

Deniil Ekimov added the

labels 2024-03-13 21:34:52 +01:00

Alaska commented

2024-03-13 23:12:03 +01:00

@brecht committed a fix for a issue like this a few days ago (c388ed1e53) and this fix is in the version of Blender you're using. So it's odd that you're experiencing issues.

I will note that the previous user that experienced this issue was able to resolve the issue by updating their AMD iGPU drivers. This may help you too. Can you give it a try?

@brecht committed a fix for a issue like this a few days ago (https://projects.blender.org/blender/blender/commit/c388ed1e5312f19491be2a9459a9f41449956a04) and this fix is in the version of Blender you're using. So it's odd that you're experiencing issues. I will note that the previous user that experienced this issue was able to resolve the issue by updating their AMD iGPU drivers. This may help you too. Can you give it a try?

Alaska added

Status

Needs Information from User

and removed

Status

Needs Triage

labels 2024-03-13 23:12:13 +01:00

Brecht Van Lommel commented

2024-03-14 04:15:30 +01:00

@aafra, @salipour, I'll check if I can find some way to fix this tomorrow. I guess OIDN is initializing HIP when we query info about NVIDIA devices.

If we can't fix it this week, I will probably disable HIP support for OIDN in 4.1. It's quite bad to make Cycles GPU rendering crash entirely because of an old AMD driver for the integrated GPU, when we don't even want to use that but an NVIDIA card instead.

@aafra, @salipour, I'll check if I can find some way to fix this tomorrow. I guess OIDN is initializing HIP when we query info about NVIDIA devices. If we can't fix it this week, I will probably disable HIP support for OIDN in 4.1. It's quite bad to make Cycles GPU rendering crash entirely because of an old AMD driver for the integrated GPU, when we don't even want to use that but an NVIDIA card instead.

Brecht Van Lommel referenced a pull request that will close this issue

2024-03-14 05:03:20 +01:00

Fix #119444, #118709: Crash in OIDN GPU detection for unsupported HIP device #119448

Deniil Ekimov commented

2024-03-14 09:50:17 +01:00

@brecht committed a fix for a issue like this a few days ago (c388ed1e53) and this fix is in the version of Blender you're using. So it's odd that you're experiencing issues.

I will note that the previous user that experienced this issue was able to resolve the issue by updating their AMD iGPU drivers. This may help you too. Can you give it a try?

Thank you! Updating the driver fixed everything.

> @brecht committed a fix for a issue like this a few days ago (https://projects.blender.org/blender/blender/commit/c388ed1e5312f19491be2a9459a9f41449956a04) and this fix is in the version of Blender you're using. So it's odd that you're experiencing issues. > > I will note that the previous user that experienced this issue was able to resolve the issue by updating their AMD iGPU drivers. This may help you too. Can you give it a try? Thank you! Updating the driver fixed everything.

Deniil Ekimov closed this issue

2024-03-14 09:50:20 +01:00

Blender Bot added

Status

Archived

and removed

Status

Needs Information from User

labels 2024-03-14 09:51:37 +01:00

Attila Áfra commented

2024-03-14 12:51:02 +01:00

@aafra, @salipour, I'll check if I can find some way to fix this tomorrow. I guess OIDN is initializing HIP when we query info about NVIDIA devices.

If we can't fix it this week, I will probably disable HIP support for OIDN in 4.1. It's quite bad to make Cycles GPU rendering crash entirely because of an old AMD driver for the integrated GPU, when we don't even want to use that but an NVIDIA card instead.

@brecht I will add these old APU targets to the upcoming OIDN patch release coming out today. It’s an ugly solution but I think it’s still good enough for now. We now have confirmation that the issue happens only with older drivers. If I add all these old targets, OIDN shouldn’t crash anymore (we fixed such a crash for newer APUs before this way). We don’t need to worry about any newer targets which don’t exist in HIP yet because those GPUs would require a new driver to work anyway. So this should be a definitive fix. Would this be good enough for Blender?

I could look into a more elegant solution as well but the problem is that AFAIK this issue was reproduced only with different kinds of APUs, and I don’t have access to such a machine. I cannot reproduce this issue with dGPU (by removing the targets for it). But I think adding the APU targets should also work reliably.

> @aafra, @salipour, I'll check if I can find some way to fix this tomorrow. I guess OIDN is initializing HIP when we query info about NVIDIA devices. > > If we can't fix it this week, I will probably disable HIP support for OIDN in 4.1. It's quite bad to make Cycles GPU rendering crash entirely because of an old AMD driver for the integrated GPU, when we don't even want to use that but an NVIDIA card instead. @brecht I will add these old APU targets to the upcoming OIDN patch release coming out today. It’s an ugly solution but I think it’s still good enough for now. We now have confirmation that the issue happens only with older drivers. If I add all these old targets, OIDN shouldn’t crash anymore (we fixed such a crash for newer APUs before this way). We don’t need to worry about any newer targets which don’t exist in HIP yet because those GPUs would require a new driver to work anyway. So this should be a definitive fix. Would this be good enough for Blender? I could look into a more elegant solution as well but the problem is that AFAIK this issue was reproduced only with different kinds of APUs, and I don’t have access to such a machine. I cannot reproduce this issue with dGPU (by removing the targets for it). But I think adding the APU targets should also work reliably.

Alaska commented

2024-03-14 13:06:02 +01:00

@aafra and @brecht I have access to a Ryzen 5 5600G, along with a AMD, Intel, and NVIDIA dGPU.

If you need any testing done, I can do it on that computer. However, that computer is not properly setup for testing at the moment, and I'd need to do some deconstructing and rebuilding of computers to get it into a testable state. I estimate it would take me about an hour or more to set it up.

@lichtwerk has a Ryzen 9 5900HX and NVIDIA RTX 3080 equiped laptop. They may also be able to help out in testing here.

@aafra and @brecht I have access to a Ryzen 5 5600G, along with a AMD, Intel, and NVIDIA dGPU. If you need any testing done, I can do it on that computer. However, that computer is not properly setup for testing at the moment, and I'd need to do some deconstructing and rebuilding of computers to get it into a testable state. I estimate it would take me about an hour or more to set it up. @lichtwerk has a Ryzen 9 5900HX and NVIDIA RTX 3080 equiped laptop. They may also be able to help out in testing here.

Brecht Van Lommel commented

2024-03-14 13:24:30 +01:00

@brecht I will add these old APU targets to the upcoming OIDN patch release coming out today. It’s an ugly solution but I think it’s still good enough for now. We now have confirmation that the issue happens only with older drivers. If I add all these old targets, OIDN shouldn’t crash anymore (we fixed such a crash for newer APUs before this way). We don’t need to worry about any newer targets which don’t exist in HIP yet because those GPUs would require a new driver to work anyway. So this should be a definitive fix. Would this be good enough for Blender?

I don't trust this enough, especially not a few days (or even weeks) before the release. If we miss some architecture or other factor, it means using Cycles in Blender 4.1 could crash for a large number of users. I don't want to take that risk.

I could look into a more elegant solution as well but the problem is that AFAIK this issue was reproduced only with different kinds of APUs, and I don’t have access to such a machine. I cannot reproduce this issue with dGPU (by removing the targets for it). But I think adding the APU targets should also work reliably.

In #119448 I'm trying to patch OIDN to not load the HIP module until we explicitly ask it to. I think that's a more reliable solution, but I haven't tested it yet.

> @brecht I will add these old APU targets to the upcoming OIDN patch release coming out today. It’s an ugly solution but I think it’s still good enough for now. We now have confirmation that the issue happens only with older drivers. If I add all these old targets, OIDN shouldn’t crash anymore (we fixed such a crash for newer APUs before this way). We don’t need to worry about any newer targets which don’t exist in HIP yet because those GPUs would require a new driver to work anyway. So this should be a definitive fix. Would this be good enough for Blender? I don't trust this enough, especially not a few days (or even weeks) before the release. If we miss some architecture or other factor, it means using Cycles in Blender 4.1 could crash for a large number of users. I don't want to take that risk. > I could look into a more elegant solution as well but the problem is that AFAIK this issue was reproduced only with different kinds of APUs, and I don’t have access to such a machine. I cannot reproduce this issue with dGPU (by removing the targets for it). But I think adding the APU targets should also work reliably. In #119448 I'm trying to patch OIDN to not load the HIP module until we explicitly ask it to. I think that's a more reliable solution, but I haven't tested it yet.

Attila Áfra commented

2024-03-14 13:29:10 +01:00

@brecht We cannot miss any architecture because the list of possible HIP targets is well-known but I get your point. But this is the best I can do on OIDN side at the moment, especially at such extremely short notice.

Attila Áfra commented

2024-03-14 14:29:57 +01:00

@brecht The main drawback of your solution is that if there is an AMD integrated GPU in the system, OIDN HIP support will be disabled for discrete AMD GPUs too, even if the driver version is up to date. I think this is too limiting.

A potentially better and safer solution would be is simply checking the HIP driver/runtime version and load the OIDN HIP module only if it's recent enough, which has the fix. This way, we wouldn't need to care about unsupported HIP devices at all. With recent drivers, OIDN would work with discrete AMD GPUs too, even if there is an unsupported iGPU. I think this should be very easy to add to Cycles but I'm also trying to add this directly to OIDN. It's a lot trickier in OIDN because I don't know when exactly does the crash happen. I just managed to get indirect access to a machine, so hopefully I could figure this out today.

@brecht The main drawback of your solution is that if there is an AMD integrated GPU in the system, OIDN HIP support will be disabled for discrete AMD GPUs too, even if the driver version is up to date. I think this is too limiting. A potentially better and safer solution would be is simply checking the HIP driver/runtime version and load the OIDN HIP module only if it's recent enough, which has the fix. This way, we wouldn't need to care about unsupported HIP devices at all. With recent drivers, OIDN would work with discrete AMD GPUs too, even if there is an unsupported iGPU. I think this should be very easy to add to Cycles but I'm also trying to add this directly to OIDN. It's a lot trickier in OIDN because I don't know when exactly does the crash happen. I just managed to get indirect access to a machine, so hopefully I could figure this out today.

Brecht Van Lommel commented

2024-03-14 14:50:11 +01:00

After discussing with Sergey here, I think we should do the following:

Disable OIDN for HIP in 4.1.0, to avoid taking too much risk.
Enable OIDN for HIP in 4.2.0, and wait to see if issues are reported. This would be with the OIDN side fixes to add architectures, and potentially checking the driver version in Cycles or OIDN.
If it's stable, we can include it in 4.2.0 and potentially a 4.1.1 release (if there is one it's typically 3-4 weeks after 4.1.0).

After discussing with Sergey here, I think we should do the following: * Disable OIDN for HIP in 4.1.0, to avoid taking too much risk. * Enable OIDN for HIP in 4.2.0, and wait to see if issues are reported. This would be with the OIDN side fixes to add architectures, and potentially checking the driver version in Cycles or OIDN. * If it's stable, we can include it in 4.2.0 and potentially a 4.1.1 release (if there is one it's typically 3-4 weeks after 4.1.0).

Brecht Van Lommel commented

2024-03-14 15:04:31 +01:00

To check the driver version on the Cycles side, we have hipewHasOldDriver though this code is for Windows only. This checks the version of the dll file. This should be quite safe compared to actually initializing HIP, which we want to avoid doing unless a user actually enabled it in the preferences.

This bug also existed in Linux drivers, I'm not sure if there is an equivalent safe way of checking the version. Or if it's likely for a user to have an older driver like this.

Either way, I still want to get more testing for such a fix and not include it in 4.1.0 immediately.

To check the driver version on the Cycles side, we have `hipewHasOldDriver` though this code is for Windows only. This checks the version of the dll file. This should be quite safe compared to actually initializing HIP, which we want to avoid doing unless a user actually enabled it in the preferences. This bug also existed in Linux drivers, I'm not sure if there is an equivalent safe way of checking the version. Or if it's likely for a user to have an older driver like this. Either way, I still want to get more testing for such a fix and not include it in 4.1.0 immediately.

Attila Áfra commented

2024-03-14 15:11:02 +01:00

@brecht Why not the check the driver or runtime version with hipDriverGetVersion or hipRuntimeGetVersion? These are available on both Windows and Linux.

@brecht Why not the check the driver or runtime version with `hipDriverGetVersion` or `hipRuntimeGetVersion`? These are available on both Windows and Linux.

Brecht Van Lommel commented

2024-03-14 15:23:02 +01:00

Because we then have to load the HIP shared library and call its functions, which we don't want to do until a user has explicitly enabled HIP in the preferences. We've had crashes doing that for CUDA, OpenCL and HIP in the past, and at least on Linux + HIP I know it's still possible for this to happen now.

Attila Áfra commented

2024-03-14 15:27:46 +01:00

I'm confused now. This crash with the integrated AMD GPUs is specific to OIDN, right? So because of this you don't even want to load the HIP runtime at all for older driver versions, even though you did before without OIDN?

Brecht Van Lommel commented

2024-03-14 15:35:23 +01:00

Without OIDN, Cycles does not load the HIP runtime until a user explicitly enables HIP in the preferences.

OIDN always initializes all the devices types. Ideally Cycles could tell it to load just the ones that we want. We can do that with the environment variables, but this OIDN initialization only happens once so it can't be updated when a user is editing the preferences.

So really adding OIDN is also adding some risk for CUDA too, since ideally we also do not initialize that until a user asks for it. But it's been a long time since I saw issues with that, so it's probably ok. And there isn't really an equivalent situation with an integrated NVIDIA GPU.

Without OIDN, Cycles does not load the HIP runtime until a user explicitly enables HIP in the preferences. OIDN always initializes all the devices types. Ideally Cycles could tell it to load just the ones that we want. We can do that with the environment variables, but this OIDN initialization only happens once so it can't be updated when a user is editing the preferences. So really adding OIDN is also adding some risk for CUDA too, since ideally we also do not initialize that until a user asks for it. But it's been a long time since I saw issues with that, so it's probably ok. And there isn't really an equivalent situation with an integrated NVIDIA GPU.

Attila Áfra commented

2024-03-14 15:38:13 +01:00

@Alaska I'd like to ask for your help with debugging this on the machine you have. Could you please do the following?

Install an AMD driver which reproduces the crash in Cycles. Please confirm that the crash happens in Blender.
Download latest OIDN binaries from OIDN website, extract it anywhere you like
Run from bin of the OIDN package: oidnBenchmark -ld

What's the output of this command or does any error/crash happen?

Thanks!

@Alaska I'd like to ask for your help with debugging this on the machine you have. Could you please do the following? - Install an AMD driver which reproduces the crash in Cycles. Please confirm that the crash happens in Blender. - Download latest OIDN binaries from OIDN website, extract it anywhere you like - Run from `bin` of the OIDN package: `oidnBenchmark -ld` What's the output of this command or does any error/crash happen? Thanks!

Attila Áfra commented

2024-03-14 15:50:18 +01:00

@brecht What you describe seems to be a somewhat different matter: a change in behavior in OIDN. You can disable loading some device modules with environment variables but once OIDN gets initialized, you can't load any more modules. In Blender the user can change the device at runtime, so this doesn't seem like a feasible solution. So what exactly do you need from OIDN? To lazily load device modules? That is possible only if Cycles never queries the available devices in OIDN, and would just directly create the kind of device it needs (e.g. CUDA). Also, it would not be possible to unload any already loaded device modules. In any case, such feature cannot be added to OIDN before the next major release.

Edit: Blender currently does query all available OIDN devices to find a device by PCI ID. While this logic is used in Blender, lazily loading select devices isn't possible, unless OIDN would introduce some new API functions. In any case, this would be a major change in OIDN.

@brecht What you describe seems to be a somewhat different matter: a change in behavior in OIDN. You can disable loading some device modules with environment variables but once OIDN gets initialized, you can't load any more modules. In Blender the user can change the device at runtime, so this doesn't seem like a feasible solution. So what exactly do you need from OIDN? To lazily load device modules? That is possible only if Cycles never queries the available devices in OIDN, and would just directly create the kind of device it needs (e.g. CUDA). Also, it would not be possible to unload any already loaded device modules. In any case, such feature cannot be added to OIDN before the next major release. Edit: Blender currently does query all available OIDN devices to find a device by PCI ID. While this logic is used in Blender, lazily loading select devices isn't possible, unless OIDN would introduce some new API functions. In any case, this would be a major change in OIDN.

Alaska commented

2024-03-14 15:52:02 +01:00

@aafra I'll test this tomorrow unless someone else tests it first.

👍 1

Brecht Van Lommel commented

2024-03-14 17:08:15 +01:00

So what exactly do you need from OIDN? To lazily load device modules? That is possible only if Cycles never queries the available devices in OIDN, and would just directly create the kind of device it needs (e.g. CUDA). Also, it would not be possible to unload any already loaded device modules. In any case, such feature cannot be added to OIDN before the next major release.

Indeed we'd want to lazily load device modules. It's not obvious what the right API for that would be. For Cycles something simple like this would work, if it prevents other functions from automatically initializing all device types:

void oidnInitSingleDeviceType(OIDNDeviceType type);

In general you may need a bigger change to avoid race conditions. Like passing an optional OIDNDeviceType to all the functions that query device information, to limit them to a single device type and only initialize that one on demand.

Of course something more sneaky with environment variables is possible too, as I was doing in #119448.

Unloading devices is not important, we don't do that in Cycles either. It's just to avoid crashing.

> So what exactly do you need from OIDN? To lazily load device modules? That is possible only if Cycles never queries the available devices in OIDN, and would just directly create the kind of device it needs (e.g. CUDA). Also, it would not be possible to unload any already loaded device modules. In any case, such feature cannot be added to OIDN before the next major release. Indeed we'd want to lazily load device modules. It's not obvious what the right API for that would be. For Cycles something simple like this would work, if it prevents other functions from automatically initializing all device types: ``` void oidnInitSingleDeviceType(OIDNDeviceType type); ``` In general you may need a bigger change to avoid race conditions. Like passing an optional `OIDNDeviceType` to all the functions that query device information, to limit them to a single device type and only initialize that one on demand. Of course something more sneaky with environment variables is possible too, as I was doing in #119448. Unloading devices is not important, we don't do that in Cycles either. It's just to avoid crashing.

Brecht Van Lommel commented

2024-03-14 17:09:24 +01:00

The PR to disable OIDN HIP is #119476.

@Alaska it would be convenient if you could test that this does indeed disable HIP support for OIDN. But I will check it here too.

The PR to disable OIDN HIP is #119476. @Alaska it would be convenient if you could test that this does indeed disable HIP support for OIDN. But I will check it here too.

Attila Áfra commented

2024-03-14 17:58:56 +01:00

@brecht I will consider lazy device module loading but I really don’t want to make major API changes because of this, especially not adding a device type parameter to all device query functions. The only reason why this would be needed is because Cycles iterates over all OIDN devices. Why is this necessary? Cycles itself decides what devices to use, so it should be able to create an OIDN device on the specific device it wants. If that device isn’t supported by OIDN, device creation fails. It’s unclear to me why matching by PCI address is needed for this.

Brecht Van Lommel commented

2024-03-14 18:16:35 +01:00

This was added by @Stefan_Werner in #115854. I think it was done because we want a way to check if a given device is supported by OIDN, without actually creating it.

We want to communicate in the user interface if OIDN is supported on the device, but without the risk and overhead of actually creating an OIDN device on Blender startup. Maybe an API function to check that could be added? Or is it already possible?

I guess if that existed, it could do lazy module loading behind the scenes.

This was added by @Stefan_Werner in #115854. I think it was done because we want a way to check if a given device is supported by OIDN, without actually creating it. We want to communicate in the user interface if OIDN is supported on the device, but without the risk and overhead of actually creating an OIDN device on Blender startup. Maybe an API function to check that could be added? Or is it already possible? I guess if that existed, it could do lazy module loading behind the scenes.

Attila Áfra commented

2024-03-14 18:23:06 +01:00

The overhead of just checking whether a device is supported wouldn’t be much lower than actually trying to create the device, and it wouldn’t be any less riskier. You’re already taking a bigger risk because when you iterate over the devices, the same checks are done by OIDN for all of them, not just the ones that are selected in Blender. There’s no other way to get that list which Cycles iterates over.

Device creation doesn’t do much more than checking for support. So unless there is solid proof that trying to creating a device is too costly, I don’t think adding new API functions would be justified.

The overhead of just checking whether a device is supported wouldn’t be much lower than actually trying to create the device, and it wouldn’t be any less riskier. You’re already taking a bigger risk because when you iterate over the devices, the same checks are done by OIDN for all of them, not just the ones that are selected in Blender. There’s no other way to get that list which Cycles iterates over. Device creation doesn’t do much more than checking for support. So unless there is solid proof that trying to creating a device is too costly, I don’t think adding new API functions would be justified.

Brecht Van Lommel commented

2024-03-14 18:32:45 +01:00

Ok, looking at the implementation it does look cheap enough to run on startup. I can make the change for Blender 4.2.

👍 1

Attila Áfra commented

2024-03-14 18:43:09 +01:00

Great! For the next OIDN version I'll try to switch to lazy device module loading, when using only API functions specific to a particular device type.

Perhaps I could still add some kind of device support check to OIDN but in an easier way than I initially thought. Instead of adding several new API functions for each device type, I could add just one: oidnIsDeviceSupported(OIDNDevice). You would still need to create the device object but that doesn't really do anything yet. The actual work, including checking for support, is happening only in oidnCommitDevice. If you don't want to have an initialized device, you could call oidnIsDeviceSupported instead of oidnCommitDevice, and then just release this unintialized device object. This would strictly do only the necessary checks, minimizing the risk as much as possible. This would be easy enough to add to be worthwhile.

Great! For the next OIDN version I'll try to switch to lazy device module loading, when using only API functions specific to a particular device type. Perhaps I could still add some kind of device support check to OIDN but in an easier way than I initially thought. Instead of adding several new API functions for each device type, I could add just one: `oidnIsDeviceSupported(OIDNDevice)`. You would still need to create the device **object** but that doesn't really do anything yet. The actual work, including checking for support, is happening only in `oidnCommitDevice`. If you don't want to have an initialized device, you could call `oidnIsDeviceSupported` instead of `oidnCommitDevice`, and then just release this unintialized device object. This would strictly do only the necessary checks, minimizing the risk as much as possible. This would be easy enough to add to be worthwhile.

Alaska commented

2024-03-14 22:38:59 +01:00

The PR to disable OIDN HIP is #119476.

@Alaska it would be convenient if you could test that this does indeed disable HIP support for OIDN. But I will check it here too.

@brecht with Blender 4.1 and a RX 7800XT, I can confirm that the GPU is not being used for denoising, and the Use GPU denoising button is greyed out.

@Alaska I'd like to ask for your help with debugging this on the machine you have. Could you please do the following...

@aafra should I still do these tests?

> The PR to disable OIDN HIP is #119476. > > @Alaska it would be convenient if you could test that this does indeed disable HIP support for OIDN. But I will check it here too. @brecht with Blender 4.1 and a RX 7800XT, I can confirm that the GPU is not being used for denoising, and the `Use GPU` denoising button is greyed out. --- > @Alaska I'd like to ask for your help with debugging this on the machine you have. Could you please do the following... @aafra should I still do these tests?

Attila Áfra commented

2024-03-14 22:39:47 +01:00

@Alaska Yes, please do the tests.

👍 1

Alaska commented

2024-03-15 01:26:03 +01:00

Using a AMD Ryzen 5 5600G (Early 2023 GPU drivers) with Intel Arc A750 setup, I can reproduce crashing with the same error message in the logs with in Blender 4.1 3e8ed795cb (what the original reporter of this bug was using).

Using Blender 4.1 1640121a6313 (Includes Brechts initial attempt at fixing his issue), there is still crashing (currently expected).

Using 4.1 335ff6efab67 (Brecht disabled HIP OIDN) there are no crashes. And when I select my Intel GPU, it can and will be used for denoising when enabled. This was just to reconfirm everything was working with one of the broken setups.

@aafra running oidnBenchmark -ld with the v2.2.1 binaries from https://github.com/OpenImageDenoise/oidn/releases/tag/v2.2.1 just gives the error "hipErrorNoBinaryForGPU: Unable to find code object for all current devices!".

If I run the benchmark on my Intel GPU or my CPU oidnBenchmark -d sycl or oidnBenchmark -d cpu I get the same error and OIDN crashes (this is expected based on the current code?).

If there are any more tests you'd like me to run in the next few days, please let me know.

Using a AMD Ryzen 5 5600G (Early 2023 GPU drivers) with Intel Arc A750 setup, I can reproduce crashing with the same error message in the logs with in Blender 4.1 `3e8ed795cb` (what the original reporter of this bug was using). Using Blender 4.1 `1640121a6313` (Includes Brechts initial attempt at fixing his issue), there is still crashing (currently expected). Using 4.1 `335ff6efab67` (Brecht disabled HIP OIDN) there are no crashes. And when I select my Intel GPU, it can and will be used for denoising when enabled. This was just to reconfirm everything was working with one of the broken setups. --- @aafra running `oidnBenchmark -ld` with the v2.2.1 binaries from https://github.com/OpenImageDenoise/oidn/releases/tag/v2.2.1 just gives the error `"hipErrorNoBinaryForGPU: Unable to find code object for all current devices!"`. If I run the benchmark on my Intel GPU or my CPU `oidnBenchmark -d sycl` or `oidnBenchmark -d cpu` I get the same error and OIDN crashes (this is expected based on the current code?). If there are any more tests you'd like me to run in the next few days, please let me know.

Attila Áfra commented

2024-03-15 01:29:59 +01:00

@Alaska Thanks a lot! Could you perhaps try to find out where exactly does OIDN crash, specifically for which HIP API call?

Alaska commented

2024-03-15 01:40:04 +01:00

@aafra I assume this just means compiling a debug builds of OIDN and stepping through the code to find the issue. If not, could you help guide me through the process of doing this? If it's easier we can talk on Blender chat or devtalk.

Blender chat: https://blender.chat/channel/render-cycles-module/members-list/Alaska
Devtalk: https://devtalk.blender.org/u/alaska/summary

@aafra I assume this just means compiling a debug builds of OIDN and stepping through the code to find the issue. If not, could you help guide me through the process of doing this? If it's easier we can talk on Blender chat or devtalk. Blender chat: https://blender.chat/channel/render-cycles-module/members-list/Alaska Devtalk: https://devtalk.blender.org/u/alaska/summary

Alaska commented

2024-03-15 12:07:16 +01:00

@aafra Through out the day I've been trying to compile OIDN and haven't been able to successfully compile it with GPU support or one that even works properly with the CPU. I'd need some assistant with this if you want me to test this for you.

I have asked for help on Blender-chat in the meantime just in case someone can offer some quick help. https://blender.chat/channel/blender-builds?msg=dSLy4yceTSgRKfjMh

@aafra Through out the day I've been trying to compile OIDN and haven't been able to successfully compile it with GPU support or one that even works properly with the CPU. I'd need some assistant with this if you want me to test this for you. I have asked for help on Blender-chat in the meantime just in case someone can offer some quick help. https://blender.chat/channel/blender-builds?msg=dSLy4yceTSgRKfjMh

Attila Áfra commented

2024-03-15 14:42:28 +01:00

Thanks @Alaska ! I'm currently finishing up the new OIDN release. I could help with the compilation after the release. But before that let's try the new OIDN 2.2.2 binaries on this machine. Hopefully, the bug will be fixed.

👍 1

Attila Áfra commented

2024-03-15 20:32:57 +01:00

@Alaska Could you please try to run oidnBenchmark -ld using the fresh OIDN 2.2.2 binaries? https://github.com/OpenImageDenoise/oidn/releases/tag/v2.2.2

@Alaska Could you please try to run `oidnBenchmark -ld` using the fresh OIDN 2.2.2 binaries? https://github.com/OpenImageDenoise/oidn/releases/tag/v2.2.2

Alaska commented

2024-03-15 22:08:16 +01:00

@Alaska Could you please try to run oidnBenchmark -ld using the fresh OIDN 2.2.2 binaries? https://github.com/OpenImageDenoise/oidn/releases/tag/v2.2.2

With the new 2.2.2 binaries, the crash doesn't occur.

With help from Attila Áfra in Blender chat, we managed to figure out which HIP calls where causing issues. In OIDN it was hipGetDeviceCount. Attila wanted to figure out if we could detect the broken drivers with hipRuntimeGetVersion, but that also resulted in crashing.

> @Alaska Could you please try to run `oidnBenchmark -ld` using the fresh OIDN 2.2.2 binaries? https://github.com/OpenImageDenoise/oidn/releases/tag/v2.2.2 With the new 2.2.2 binaries, the crash doesn't occur. With help from Attila Áfra in Blender chat, we managed to figure out which HIP calls where causing issues. In OIDN it was `hipGetDeviceCount`. Attila wanted to figure out if we could detect the broken drivers with `hipRuntimeGetVersion`, but that also resulted in crashing.

Attila Áfra commented

2024-03-15 23:44:23 +01:00

@brecht With the help of @Alaska and @MarkFreeDev , we can conclude that the HIP error/crash doesn't happen using the following minimum driver versions (older versions may also work but this is what was tested so far):

Windows: Adrenalin Edition 24.1.1
Linux: ROCm 5.7.0

I also confirmed this Windows driver version on a different machine with a newer AMD APU.

So I think checking the driver version would be a robust solution. I'll look into how it would be best to implement this in OIDN. Meanwhile, the workaround in OIDN 2.2.2 also seems to work but it may be more risky.

@brecht With the help of @Alaska and @MarkFreeDev , we can conclude that the HIP error/crash doesn't happen using the following minimum driver versions (older versions may also work but this is what was tested so far): Windows: Adrenalin Edition 24.1.1 Linux: ROCm 5.7.0 I also confirmed this Windows driver version on a different machine with a newer AMD APU. So I think checking the driver version would be a robust solution. I'll look into how it would be best to implement this in OIDN. Meanwhile, the workaround in OIDN 2.2.2 also seems to work but it may be more risky.

Sign in to join this conversation.

No Label

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!" with nVidia GPU and AMD iGPU #119444