"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!" with nVidia GPU and AMD iGPU #119444

Closed
opened 2024-03-13 21:34:52 +01:00 by Deniil Ekimov · 34 comments

System Information
Operating system: Windows 10 Pro 22H2 19045.4046
Graphics card: nVidia RTX 3060, AMD Radeon Graphics in Ryzen 7 5700G

Blender Version
Broken: 4.1 Release candidate 3e8ed795cb
Worked: (newest version of Blender that worked as expected) 4.0.2

Short description of error

On a system with nVidia GPU and AMD iGPU blender crashes when you switch to Cycles
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
HIP is not even set in Preferences -> System. Selecting CUDA, OptiX or None does not help.

Exact steps for others to reproduce the error
Based on the default startup file

  1. Select Cycles renderer, you can leave Render Device set as CPU
  2. Switch to Rendered preview
  3. Blender crashes with error "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"

I wanted to attach the crash log, but AppData/Local/Temp is empty. Tried emptying temp and crashing again and still no files. It seems Blender is not creating the crash log

**System Information** Operating system: Windows 10 Pro 22H2 19045.4046 Graphics card: nVidia RTX 3060, AMD Radeon Graphics in Ryzen 7 5700G **Blender Version** Broken: 4.1 Release candidate 3e8ed795cb14 Worked: (newest version of Blender that worked as expected) 4.0.2 **Short description of error** On a system with nVidia GPU and AMD iGPU blender crashes when you switch to Cycles "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!" HIP is not even set in Preferences -> System. Selecting CUDA, OptiX or None does not help. **Exact steps for others to reproduce the error** Based on the default startup file 1. Select Cycles renderer, you can leave Render Device set as CPU 2. Switch to Rendered preview 3. Blender crashes with error "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!" I wanted to attach the crash log, but AppData/Local/Temp is empty. Tried emptying temp and crashing again and still no files. It seems Blender is not creating the crash log
Deniil Ekimov added the
Status
Needs Triage
Type
Report
Priority
Normal
labels 2024-03-13 21:34:52 +01:00
Member

@brecht committed a fix for a issue like this a few days ago (c388ed1e53) and this fix is in the version of Blender you're using. So it's odd that you're experiencing issues.

I will note that the previous user that experienced this issue was able to resolve the issue by updating their AMD iGPU drivers. This may help you too. Can you give it a try?

@brecht committed a fix for a issue like this a few days ago (https://projects.blender.org/blender/blender/commit/c388ed1e5312f19491be2a9459a9f41449956a04) and this fix is in the version of Blender you're using. So it's odd that you're experiencing issues. I will note that the previous user that experienced this issue was able to resolve the issue by updating their AMD iGPU drivers. This may help you too. Can you give it a try?
Alaska added
Status
Needs Information from User
and removed
Status
Needs Triage
labels 2024-03-13 23:12:13 +01:00

@aafra, @salipour, I'll check if I can find some way to fix this tomorrow. I guess OIDN is initializing HIP when we query info about NVIDIA devices.

If we can't fix it this week, I will probably disable HIP support for OIDN in 4.1. It's quite bad to make Cycles GPU rendering crash entirely because of an old AMD driver for the integrated GPU, when we don't even want to use that but an NVIDIA card instead.

@aafra, @salipour, I'll check if I can find some way to fix this tomorrow. I guess OIDN is initializing HIP when we query info about NVIDIA devices. If we can't fix it this week, I will probably disable HIP support for OIDN in 4.1. It's quite bad to make Cycles GPU rendering crash entirely because of an old AMD driver for the integrated GPU, when we don't even want to use that but an NVIDIA card instead.
Author

@brecht committed a fix for a issue like this a few days ago (c388ed1e53) and this fix is in the version of Blender you're using. So it's odd that you're experiencing issues.

I will note that the previous user that experienced this issue was able to resolve the issue by updating their AMD iGPU drivers. This may help you too. Can you give it a try?

Thank you! Updating the driver fixed everything.

> @brecht committed a fix for a issue like this a few days ago (https://projects.blender.org/blender/blender/commit/c388ed1e5312f19491be2a9459a9f41449956a04) and this fix is in the version of Blender you're using. So it's odd that you're experiencing issues. > > I will note that the previous user that experienced this issue was able to resolve the issue by updating their AMD iGPU drivers. This may help you too. Can you give it a try? Thank you! Updating the driver fixed everything.
Blender Bot added
Status
Archived
and removed
Status
Needs Information from User
labels 2024-03-14 09:51:37 +01:00
Contributor

@aafra, @salipour, I'll check if I can find some way to fix this tomorrow. I guess OIDN is initializing HIP when we query info about NVIDIA devices.

If we can't fix it this week, I will probably disable HIP support for OIDN in 4.1. It's quite bad to make Cycles GPU rendering crash entirely because of an old AMD driver for the integrated GPU, when we don't even want to use that but an NVIDIA card instead.

@brecht I will add these old APU targets to the upcoming OIDN patch release coming out today. It’s an ugly solution but I think it’s still good enough for now. We now have confirmation that the issue happens only with older drivers. If I add all these old targets, OIDN shouldn’t crash anymore (we fixed such a crash for newer APUs before this way). We don’t need to worry about any newer targets which don’t exist in HIP yet because those GPUs would require a new driver to work anyway. So this should be a definitive fix. Would this be good enough for Blender?

I could look into a more elegant solution as well but the problem is that AFAIK this issue was reproduced only with different kinds of APUs, and I don’t have access to such a machine. I cannot reproduce this issue with dGPU (by removing the targets for it). But I think adding the APU targets should also work reliably.

> @aafra, @salipour, I'll check if I can find some way to fix this tomorrow. I guess OIDN is initializing HIP when we query info about NVIDIA devices. > > If we can't fix it this week, I will probably disable HIP support for OIDN in 4.1. It's quite bad to make Cycles GPU rendering crash entirely because of an old AMD driver for the integrated GPU, when we don't even want to use that but an NVIDIA card instead. @brecht I will add these old APU targets to the upcoming OIDN patch release coming out today. It’s an ugly solution but I think it’s still good enough for now. We now have confirmation that the issue happens only with older drivers. If I add all these old targets, OIDN shouldn’t crash anymore (we fixed such a crash for newer APUs before this way). We don’t need to worry about any newer targets which don’t exist in HIP yet because those GPUs would require a new driver to work anyway. So this should be a definitive fix. Would this be good enough for Blender? I could look into a more elegant solution as well but the problem is that AFAIK this issue was reproduced only with different kinds of APUs, and I don’t have access to such a machine. I cannot reproduce this issue with dGPU (by removing the targets for it). But I think adding the APU targets should also work reliably.
Member

@aafra and @brecht I have access to a Ryzen 5 5600G, along with a AMD, Intel, and NVIDIA dGPU.

If you need any testing done, I can do it on that computer. However, that computer is not properly setup for testing at the moment, and I'd need to do some deconstructing and rebuilding of computers to get it into a testable state. I estimate it would take me about an hour or more to set it up.

@lichtwerk has a Ryzen 9 5900HX and NVIDIA RTX 3080 equiped laptop. They may also be able to help out in testing here.

@aafra and @brecht I have access to a Ryzen 5 5600G, along with a AMD, Intel, and NVIDIA dGPU. If you need any testing done, I can do it on that computer. However, that computer is not properly setup for testing at the moment, and I'd need to do some deconstructing and rebuilding of computers to get it into a testable state. I estimate it would take me about an hour or more to set it up. @lichtwerk has a Ryzen 9 5900HX and NVIDIA RTX 3080 equiped laptop. They may also be able to help out in testing here.

@brecht I will add these old APU targets to the upcoming OIDN patch release coming out today. It’s an ugly solution but I think it’s still good enough for now. We now have confirmation that the issue happens only with older drivers. If I add all these old targets, OIDN shouldn’t crash anymore (we fixed such a crash for newer APUs before this way). We don’t need to worry about any newer targets which don’t exist in HIP yet because those GPUs would require a new driver to work anyway. So this should be a definitive fix. Would this be good enough for Blender?

I don't trust this enough, especially not a few days (or even weeks) before the release. If we miss some architecture or other factor, it means using Cycles in Blender 4.1 could crash for a large number of users. I don't want to take that risk.

I could look into a more elegant solution as well but the problem is that AFAIK this issue was reproduced only with different kinds of APUs, and I don’t have access to such a machine. I cannot reproduce this issue with dGPU (by removing the targets for it). But I think adding the APU targets should also work reliably.

In #119448 I'm trying to patch OIDN to not load the HIP module until we explicitly ask it to. I think that's a more reliable solution, but I haven't tested it yet.

> @brecht I will add these old APU targets to the upcoming OIDN patch release coming out today. It’s an ugly solution but I think it’s still good enough for now. We now have confirmation that the issue happens only with older drivers. If I add all these old targets, OIDN shouldn’t crash anymore (we fixed such a crash for newer APUs before this way). We don’t need to worry about any newer targets which don’t exist in HIP yet because those GPUs would require a new driver to work anyway. So this should be a definitive fix. Would this be good enough for Blender? I don't trust this enough, especially not a few days (or even weeks) before the release. If we miss some architecture or other factor, it means using Cycles in Blender 4.1 could crash for a large number of users. I don't want to take that risk. > I could look into a more elegant solution as well but the problem is that AFAIK this issue was reproduced only with different kinds of APUs, and I don’t have access to such a machine. I cannot reproduce this issue with dGPU (by removing the targets for it). But I think adding the APU targets should also work reliably. In #119448 I'm trying to patch OIDN to not load the HIP module until we explicitly ask it to. I think that's a more reliable solution, but I haven't tested it yet.
Contributor

@brecht We cannot miss any architecture because the list of possible HIP targets is well-known but I get your point. But this is the best I can do on OIDN side at the moment, especially at such extremely short notice.

@brecht We cannot miss any architecture because the list of possible HIP targets is well-known but I get your point. But this is the best I can do on OIDN side at the moment, especially at such extremely short notice.
Contributor

@brecht The main drawback of your solution is that if there is an AMD integrated GPU in the system, OIDN HIP support will be disabled for discrete AMD GPUs too, even if the driver version is up to date. I think this is too limiting.

A potentially better and safer solution would be is simply checking the HIP driver/runtime version and load the OIDN HIP module only if it's recent enough, which has the fix. This way, we wouldn't need to care about unsupported HIP devices at all. With recent drivers, OIDN would work with discrete AMD GPUs too, even if there is an unsupported iGPU. I think this should be very easy to add to Cycles but I'm also trying to add this directly to OIDN. It's a lot trickier in OIDN because I don't know when exactly does the crash happen. I just managed to get indirect access to a machine, so hopefully I could figure this out today.

@brecht The main drawback of your solution is that if there is an AMD integrated GPU in the system, OIDN HIP support will be disabled for discrete AMD GPUs too, even if the driver version is up to date. I think this is too limiting. A potentially better and safer solution would be is simply checking the HIP driver/runtime version and load the OIDN HIP module only if it's recent enough, which has the fix. This way, we wouldn't need to care about unsupported HIP devices at all. With recent drivers, OIDN would work with discrete AMD GPUs too, even if there is an unsupported iGPU. I think this should be very easy to add to Cycles but I'm also trying to add this directly to OIDN. It's a lot trickier in OIDN because I don't know when exactly does the crash happen. I just managed to get indirect access to a machine, so hopefully I could figure this out today.

After discussing with Sergey here, I think we should do the following:

  • Disable OIDN for HIP in 4.1.0, to avoid taking too much risk.
  • Enable OIDN for HIP in 4.2.0, and wait to see if issues are reported. This would be with the OIDN side fixes to add architectures, and potentially checking the driver version in Cycles or OIDN.
  • If it's stable, we can include it in 4.2.0 and potentially a 4.1.1 release (if there is one it's typically 3-4 weeks after 4.1.0).
After discussing with Sergey here, I think we should do the following: * Disable OIDN for HIP in 4.1.0, to avoid taking too much risk. * Enable OIDN for HIP in 4.2.0, and wait to see if issues are reported. This would be with the OIDN side fixes to add architectures, and potentially checking the driver version in Cycles or OIDN. * If it's stable, we can include it in 4.2.0 and potentially a 4.1.1 release (if there is one it's typically 3-4 weeks after 4.1.0).

To check the driver version on the Cycles side, we have hipewHasOldDriver though this code is for Windows only. This checks the version of the dll file. This should be quite safe compared to actually initializing HIP, which we want to avoid doing unless a user actually enabled it in the preferences.

This bug also existed in Linux drivers, I'm not sure if there is an equivalent safe way of checking the version. Or if it's likely for a user to have an older driver like this.

Either way, I still want to get more testing for such a fix and not include it in 4.1.0 immediately.

To check the driver version on the Cycles side, we have `hipewHasOldDriver` though this code is for Windows only. This checks the version of the dll file. This should be quite safe compared to actually initializing HIP, which we want to avoid doing unless a user actually enabled it in the preferences. This bug also existed in Linux drivers, I'm not sure if there is an equivalent safe way of checking the version. Or if it's likely for a user to have an older driver like this. Either way, I still want to get more testing for such a fix and not include it in 4.1.0 immediately.
Contributor

@brecht Why not the check the driver or runtime version with hipDriverGetVersion or hipRuntimeGetVersion? These are available on both Windows and Linux.

@brecht Why not the check the driver or runtime version with `hipDriverGetVersion` or `hipRuntimeGetVersion`? These are available on both Windows and Linux.

Because we then have to load the HIP shared library and call its functions, which we don't want to do until a user has explicitly enabled HIP in the preferences. We've had crashes doing that for CUDA, OpenCL and HIP in the past, and at least on Linux + HIP I know it's still possible for this to happen now.

Because we then have to load the HIP shared library and call its functions, which we don't want to do until a user has explicitly enabled HIP in the preferences. We've had crashes doing that for CUDA, OpenCL and HIP in the past, and at least on Linux + HIP I know it's still possible for this to happen now.
Contributor

I'm confused now. This crash with the integrated AMD GPUs is specific to OIDN, right? So because of this you don't even want to load the HIP runtime at all for older driver versions, even though you did before without OIDN?

I'm confused now. This crash with the integrated AMD GPUs is specific to OIDN, right? So because of this you don't even want to load the HIP runtime at all for older driver versions, even though you did before without OIDN?

Without OIDN, Cycles does not load the HIP runtime until a user explicitly enables HIP in the preferences.

OIDN always initializes all the devices types. Ideally Cycles could tell it to load just the ones that we want. We can do that with the environment variables, but this OIDN initialization only happens once so it can't be updated when a user is editing the preferences.

So really adding OIDN is also adding some risk for CUDA too, since ideally we also do not initialize that until a user asks for it. But it's been a long time since I saw issues with that, so it's probably ok. And there isn't really an equivalent situation with an integrated NVIDIA GPU.

Without OIDN, Cycles does not load the HIP runtime until a user explicitly enables HIP in the preferences. OIDN always initializes all the devices types. Ideally Cycles could tell it to load just the ones that we want. We can do that with the environment variables, but this OIDN initialization only happens once so it can't be updated when a user is editing the preferences. So really adding OIDN is also adding some risk for CUDA too, since ideally we also do not initialize that until a user asks for it. But it's been a long time since I saw issues with that, so it's probably ok. And there isn't really an equivalent situation with an integrated NVIDIA GPU.
Contributor

@Alaska I'd like to ask for your help with debugging this on the machine you have. Could you please do the following?

  • Install an AMD driver which reproduces the crash in Cycles. Please confirm that the crash happens in Blender.
  • Download latest OIDN binaries from OIDN website, extract it anywhere you like
  • Run from bin of the OIDN package: oidnBenchmark -ld

What's the output of this command or does any error/crash happen?

Thanks!

@Alaska I'd like to ask for your help with debugging this on the machine you have. Could you please do the following? - Install an AMD driver which reproduces the crash in Cycles. Please confirm that the crash happens in Blender. - Download latest OIDN binaries from OIDN website, extract it anywhere you like - Run from `bin` of the OIDN package: `oidnBenchmark -ld` What's the output of this command or does any error/crash happen? Thanks!
Contributor

@brecht What you describe seems to be a somewhat different matter: a change in behavior in OIDN. You can disable loading some device modules with environment variables but once OIDN gets initialized, you can't load any more modules. In Blender the user can change the device at runtime, so this doesn't seem like a feasible solution. So what exactly do you need from OIDN? To lazily load device modules? That is possible only if Cycles never queries the available devices in OIDN, and would just directly create the kind of device it needs (e.g. CUDA). Also, it would not be possible to unload any already loaded device modules. In any case, such feature cannot be added to OIDN before the next major release.

Edit: Blender currently does query all available OIDN devices to find a device by PCI ID. While this logic is used in Blender, lazily loading select devices isn't possible, unless OIDN would introduce some new API functions. In any case, this would be a major change in OIDN.

@brecht What you describe seems to be a somewhat different matter: a change in behavior in OIDN. You can disable loading some device modules with environment variables but once OIDN gets initialized, you can't load any more modules. In Blender the user can change the device at runtime, so this doesn't seem like a feasible solution. So what exactly do you need from OIDN? To lazily load device modules? That is possible only if Cycles never queries the available devices in OIDN, and would just directly create the kind of device it needs (e.g. CUDA). Also, it would not be possible to unload any already loaded device modules. In any case, such feature cannot be added to OIDN before the next major release. Edit: Blender currently does query all available OIDN devices to find a device by PCI ID. While this logic is used in Blender, lazily loading select devices isn't possible, unless OIDN would introduce some new API functions. In any case, this would be a major change in OIDN.
Member

@aafra I'll test this tomorrow unless someone else tests it first.

@aafra I'll test this tomorrow unless someone else tests it first.

So what exactly do you need from OIDN? To lazily load device modules? That is possible only if Cycles never queries the available devices in OIDN, and would just directly create the kind of device it needs (e.g. CUDA). Also, it would not be possible to unload any already loaded device modules. In any case, such feature cannot be added to OIDN before the next major release.

Indeed we'd want to lazily load device modules. It's not obvious what the right API for that would be. For Cycles something simple like this would work, if it prevents other functions from automatically initializing all device types:

void oidnInitSingleDeviceType(OIDNDeviceType type);

In general you may need a bigger change to avoid race conditions. Like passing an optional OIDNDeviceType to all the functions that query device information, to limit them to a single device type and only initialize that one on demand.

Of course something more sneaky with environment variables is possible too, as I was doing in #119448.

Unloading devices is not important, we don't do that in Cycles either. It's just to avoid crashing.

> So what exactly do you need from OIDN? To lazily load device modules? That is possible only if Cycles never queries the available devices in OIDN, and would just directly create the kind of device it needs (e.g. CUDA). Also, it would not be possible to unload any already loaded device modules. In any case, such feature cannot be added to OIDN before the next major release. Indeed we'd want to lazily load device modules. It's not obvious what the right API for that would be. For Cycles something simple like this would work, if it prevents other functions from automatically initializing all device types: ``` void oidnInitSingleDeviceType(OIDNDeviceType type); ``` In general you may need a bigger change to avoid race conditions. Like passing an optional `OIDNDeviceType` to all the functions that query device information, to limit them to a single device type and only initialize that one on demand. Of course something more sneaky with environment variables is possible too, as I was doing in #119448. Unloading devices is not important, we don't do that in Cycles either. It's just to avoid crashing.

The PR to disable OIDN HIP is #119476.

@Alaska it would be convenient if you could test that this does indeed disable HIP support for OIDN. But I will check it here too.

The PR to disable OIDN HIP is #119476. @Alaska it would be convenient if you could test that this does indeed disable HIP support for OIDN. But I will check it here too.
Contributor

@brecht I will consider lazy device module loading but I really don’t want to make major API changes because of this, especially not adding a device type parameter to all device query functions. The only reason why this would be needed is because Cycles iterates over all OIDN devices. Why is this necessary? Cycles itself decides what devices to use, so it should be able to create an OIDN device on the specific device it wants. If that device isn’t supported by OIDN, device creation fails. It’s unclear to me why matching by PCI address is needed for this.

@brecht I will consider lazy device module loading but I really don’t want to make major API changes because of this, especially not adding a device type parameter to all device query functions. The only reason why this would be needed is because Cycles iterates over all OIDN devices. Why is this necessary? Cycles itself decides what devices to use, so it should be able to create an OIDN device on the specific device it wants. If that device isn’t supported by OIDN, device creation fails. It’s unclear to me why matching by PCI address is needed for this.

This was added by @Stefan_Werner in #115854. I think it was done because we want a way to check if a given device is supported by OIDN, without actually creating it.

We want to communicate in the user interface if OIDN is supported on the device, but without the risk and overhead of actually creating an OIDN device on Blender startup. Maybe an API function to check that could be added? Or is it already possible?

I guess if that existed, it could do lazy module loading behind the scenes.

This was added by @Stefan_Werner in #115854. I think it was done because we want a way to check if a given device is supported by OIDN, without actually creating it. We want to communicate in the user interface if OIDN is supported on the device, but without the risk and overhead of actually creating an OIDN device on Blender startup. Maybe an API function to check that could be added? Or is it already possible? I guess if that existed, it could do lazy module loading behind the scenes.
Contributor

The overhead of just checking whether a device is supported wouldn’t be much lower than actually trying to create the device, and it wouldn’t be any less riskier. You’re already taking a bigger risk because when you iterate over the devices, the same checks are done by OIDN for all of them, not just the ones that are selected in Blender. There’s no other way to get that list which Cycles iterates over.

Device creation doesn’t do much more than checking for support. So unless there is solid proof that trying to creating a device is too costly, I don’t think adding new API functions would be justified.

The overhead of just checking whether a device is supported wouldn’t be much lower than actually trying to create the device, and it wouldn’t be any less riskier. You’re already taking a bigger risk because when you iterate over the devices, the same checks are done by OIDN for all of them, not just the ones that are selected in Blender. There’s no other way to get that list which Cycles iterates over. Device creation doesn’t do much more than checking for support. So unless there is solid proof that trying to creating a device is too costly, I don’t think adding new API functions would be justified.

Ok, looking at the implementation it does look cheap enough to run on startup. I can make the change for Blender 4.2.

Ok, looking at the implementation it does look cheap enough to run on startup. I can make the change for Blender 4.2.
Contributor

Great! For the next OIDN version I'll try to switch to lazy device module loading, when using only API functions specific to a particular device type.

Perhaps I could still add some kind of device support check to OIDN but in an easier way than I initially thought. Instead of adding several new API functions for each device type, I could add just one: oidnIsDeviceSupported(OIDNDevice). You would still need to create the device object but that doesn't really do anything yet. The actual work, including checking for support, is happening only in oidnCommitDevice. If you don't want to have an initialized device, you could call oidnIsDeviceSupported instead of oidnCommitDevice, and then just release this unintialized device object. This would strictly do only the necessary checks, minimizing the risk as much as possible. This would be easy enough to add to be worthwhile.

Great! For the next OIDN version I'll try to switch to lazy device module loading, when using only API functions specific to a particular device type. Perhaps I could still add some kind of device support check to OIDN but in an easier way than I initially thought. Instead of adding several new API functions for each device type, I could add just one: `oidnIsDeviceSupported(OIDNDevice)`. You would still need to create the device **object** but that doesn't really do anything yet. The actual work, including checking for support, is happening only in `oidnCommitDevice`. If you don't want to have an initialized device, you could call `oidnIsDeviceSupported` instead of `oidnCommitDevice`, and then just release this unintialized device object. This would strictly do only the necessary checks, minimizing the risk as much as possible. This would be easy enough to add to be worthwhile.
Member

The PR to disable OIDN HIP is #119476.

@Alaska it would be convenient if you could test that this does indeed disable HIP support for OIDN. But I will check it here too.

@brecht with Blender 4.1 and a RX 7800XT, I can confirm that the GPU is not being used for denoising, and the Use GPU denoising button is greyed out.


@Alaska I'd like to ask for your help with debugging this on the machine you have. Could you please do the following...

@aafra should I still do these tests?

> The PR to disable OIDN HIP is #119476. > > @Alaska it would be convenient if you could test that this does indeed disable HIP support for OIDN. But I will check it here too. @brecht with Blender 4.1 and a RX 7800XT, I can confirm that the GPU is not being used for denoising, and the `Use GPU` denoising button is greyed out. --- > @Alaska I'd like to ask for your help with debugging this on the machine you have. Could you please do the following... @aafra should I still do these tests?
Contributor

@Alaska Yes, please do the tests.

@Alaska Yes, please do the tests.
Member

Using a AMD Ryzen 5 5600G (Early 2023 GPU drivers) with Intel Arc A750 setup, I can reproduce crashing with the same error message in the logs with in Blender 4.1 3e8ed795cb (what the original reporter of this bug was using).

Using Blender 4.1 1640121a6313 (Includes Brechts initial attempt at fixing his issue), there is still crashing (currently expected).

Using 4.1 335ff6efab67 (Brecht disabled HIP OIDN) there are no crashes. And when I select my Intel GPU, it can and will be used for denoising when enabled. This was just to reconfirm everything was working with one of the broken setups.


@aafra running oidnBenchmark -ld with the v2.2.1 binaries from https://github.com/OpenImageDenoise/oidn/releases/tag/v2.2.1 just gives the error "hipErrorNoBinaryForGPU: Unable to find code object for all current devices!".

If I run the benchmark on my Intel GPU or my CPU oidnBenchmark -d sycl or oidnBenchmark -d cpu I get the same error and OIDN crashes (this is expected based on the current code?).

If there are any more tests you'd like me to run in the next few days, please let me know.

Using a AMD Ryzen 5 5600G (Early 2023 GPU drivers) with Intel Arc A750 setup, I can reproduce crashing with the same error message in the logs with in Blender 4.1 `3e8ed795cb` (what the original reporter of this bug was using). Using Blender 4.1 `1640121a6313` (Includes Brechts initial attempt at fixing his issue), there is still crashing (currently expected). Using 4.1 `335ff6efab67` (Brecht disabled HIP OIDN) there are no crashes. And when I select my Intel GPU, it can and will be used for denoising when enabled. This was just to reconfirm everything was working with one of the broken setups. --- @aafra running `oidnBenchmark -ld` with the v2.2.1 binaries from https://github.com/OpenImageDenoise/oidn/releases/tag/v2.2.1 just gives the error `"hipErrorNoBinaryForGPU: Unable to find code object for all current devices!"`. If I run the benchmark on my Intel GPU or my CPU `oidnBenchmark -d sycl` or `oidnBenchmark -d cpu` I get the same error and OIDN crashes (this is expected based on the current code?). If there are any more tests you'd like me to run in the next few days, please let me know.
Contributor

@Alaska Thanks a lot! Could you perhaps try to find out where exactly does OIDN crash, specifically for which HIP API call?

@Alaska Thanks a lot! Could you perhaps try to find out where exactly does OIDN crash, specifically for which HIP API call?
Member

@aafra I assume this just means compiling a debug builds of OIDN and stepping through the code to find the issue. If not, could you help guide me through the process of doing this? If it's easier we can talk on Blender chat or devtalk.

Blender chat: https://blender.chat/channel/render-cycles-module/members-list/Alaska
Devtalk: https://devtalk.blender.org/u/alaska/summary

@aafra I assume this just means compiling a debug builds of OIDN and stepping through the code to find the issue. If not, could you help guide me through the process of doing this? If it's easier we can talk on Blender chat or devtalk. Blender chat: https://blender.chat/channel/render-cycles-module/members-list/Alaska Devtalk: https://devtalk.blender.org/u/alaska/summary
Member

@aafra Through out the day I've been trying to compile OIDN and haven't been able to successfully compile it with GPU support or one that even works properly with the CPU. I'd need some assistant with this if you want me to test this for you.

I have asked for help on Blender-chat in the meantime just in case someone can offer some quick help. https://blender.chat/channel/blender-builds?msg=dSLy4yceTSgRKfjMh

@aafra Through out the day I've been trying to compile OIDN and haven't been able to successfully compile it with GPU support or one that even works properly with the CPU. I'd need some assistant with this if you want me to test this for you. I have asked for help on Blender-chat in the meantime just in case someone can offer some quick help. https://blender.chat/channel/blender-builds?msg=dSLy4yceTSgRKfjMh
Contributor

Thanks @Alaska ! I'm currently finishing up the new OIDN release. I could help with the compilation after the release. But before that let's try the new OIDN 2.2.2 binaries on this machine. Hopefully, the bug will be fixed.

Thanks @Alaska ! I'm currently finishing up the new OIDN release. I could help with the compilation after the release. But before that let's try the new OIDN 2.2.2 binaries on this machine. Hopefully, the bug will be fixed.
Contributor

@Alaska Could you please try to run oidnBenchmark -ld using the fresh OIDN 2.2.2 binaries? https://github.com/OpenImageDenoise/oidn/releases/tag/v2.2.2

@Alaska Could you please try to run `oidnBenchmark -ld` using the fresh OIDN 2.2.2 binaries? https://github.com/OpenImageDenoise/oidn/releases/tag/v2.2.2
Member

@Alaska Could you please try to run oidnBenchmark -ld using the fresh OIDN 2.2.2 binaries? https://github.com/OpenImageDenoise/oidn/releases/tag/v2.2.2

With the new 2.2.2 binaries, the crash doesn't occur.

With help from Attila Áfra in Blender chat, we managed to figure out which HIP calls where causing issues. In OIDN it was hipGetDeviceCount. Attila wanted to figure out if we could detect the broken drivers with hipRuntimeGetVersion, but that also resulted in crashing.

> @Alaska Could you please try to run `oidnBenchmark -ld` using the fresh OIDN 2.2.2 binaries? https://github.com/OpenImageDenoise/oidn/releases/tag/v2.2.2 With the new 2.2.2 binaries, the crash doesn't occur. With help from Attila Áfra in Blender chat, we managed to figure out which HIP calls where causing issues. In OIDN it was `hipGetDeviceCount`. Attila wanted to figure out if we could detect the broken drivers with `hipRuntimeGetVersion`, but that also resulted in crashing.
Contributor

@brecht With the help of @Alaska and @MarkFreeDev , we can conclude that the HIP error/crash doesn't happen using the following minimum driver versions (older versions may also work but this is what was tested so far):

Windows: Adrenalin Edition 24.1.1
Linux: ROCm 5.7.0

I also confirmed this Windows driver version on a different machine with a newer AMD APU.

So I think checking the driver version would be a robust solution. I'll look into how it would be best to implement this in OIDN. Meanwhile, the workaround in OIDN 2.2.2 also seems to work but it may be more risky.

@brecht With the help of @Alaska and @MarkFreeDev , we can conclude that the HIP error/crash doesn't happen using the following minimum driver versions (older versions may also work but this is what was tested so far): Windows: Adrenalin Edition 24.1.1 Linux: ROCm 5.7.0 I also confirmed this Windows driver version on a different machine with a newer AMD APU. So I think checking the driver version would be a robust solution. I'll look into how it would be best to implement this in OIDN. Meanwhile, the workaround in OIDN 2.2.2 also seems to work but it may be more risky.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#119444
No description provided.