Blender freeze when opening system settings #116697

Open
opened 2024-01-01 23:18:43 +01:00 by MoreOwO · 24 comments

System Information
Operating system: Linux-6.6.8-arch1-1-x86_64-with-glibc2.38 64 Bits, X11 UI
Graphics card: AMD Radeon RX 6800 XT (radeonsi, navi21, LLVM 16.0.6, DRM 3.54, 6.6.8-arch1-1) AMD 4.6 (Core Profile) Mesa 23.3.2-arch1.2

Blender Version
Broken: version: 4.0.3 Release Candidate, branch: blender-v4.0-release, commit date: 2023-12-18 15:44, hash: 63e9cead5ff0
Worked: Unknown

Short description of error
When opening system settings, blender freezes, maybe it would unstuck itself at some point, but dont seem like it.
Please find attached a file with blender ran with ./blender-launcher --log "*" --log-level -1 --log-file latest.log. Note that i needed to kill blender with a sigkill in order to close it.

Exact steps for others to reproduce the error
Open blender,
Open Settings
Click on System

**System Information** Operating system: Linux-6.6.8-arch1-1-x86_64-with-glibc2.38 64 Bits, X11 UI Graphics card: AMD Radeon RX 6800 XT (radeonsi, navi21, LLVM 16.0.6, DRM 3.54, 6.6.8-arch1-1) AMD 4.6 (Core Profile) Mesa 23.3.2-arch1.2 **Blender Version** Broken: version: 4.0.3 Release Candidate, branch: blender-v4.0-release, commit date: 2023-12-18 15:44, hash: `63e9cead5ff0` Worked: Unknown **Short description of error** When opening system settings, blender freezes, maybe it would unstuck itself at some point, but dont seem like it. Please find attached a file with blender ran with `./blender-launcher --log "*" --log-level -1 --log-file latest.log`. Note that i needed to kill blender with a sigkill in order to close it. **Exact steps for others to reproduce the error** Open blender, Open Settings Click on System
864 KiB
MoreOwO added the
Status
Needs Triage
Priority
Normal
Type
Report
labels 2024-01-01 23:18:44 +01:00

@Svenstaro @BrianSavery Would it be possible to help investigate this issue on Arch. There are several[1] other [2] bugs[3], and perhaps additional ones, which seem to indicate a problem with Arch and AMD HIP. Symptoms typically include hangs or crashes. Also mentioned in the arch tracker[4]

[1] #116283 - Blender Freezes with rocm update
[2] #115670 - Selecting HIP causes a complete hanging of Blender 4.0
[3] #115315 - Arch Linux - Any attempt to select HIP GPU within Blender freezes PC
[4] https://gitlab.archlinux.org/archlinux/packaging/packages/blender/-/issues/2#note_141972

@Svenstaro @BrianSavery Would it be possible to help investigate this issue on Arch. There are several[1] other [2] bugs[3], and perhaps additional ones, which seem to indicate a problem with Arch and AMD HIP. Symptoms typically include hangs or crashes. Also mentioned in the arch tracker[4] [1] #116283 - Blender Freezes with rocm update [2] #115670 - Selecting HIP causes a complete hanging of Blender 4.0 [3] #115315 - Arch Linux - Any attempt to select HIP GPU within Blender freezes PC [4] https://gitlab.archlinux.org/archlinux/packaging/packages/blender/-/issues/2#note_141972
Jesse Yurkovich added
Status
Needs Info from Developers
and removed
Status
Needs Triage
labels 2024-01-02 01:40:18 +01:00

I passed this on to my colleague Torsten Keßler who maintains the HIP stuff as I don't have the hardware myself.

I passed this on to my colleague Torsten Keßler who maintains the HIP stuff as I don't have the hardware myself.

I can reproduce the behavior with Linux 6.6.9. Blender does not crash, however, with Linux LTS 6.1.70. See also this bug report for amdgpu.

I can reproduce the behavior with Linux 6.6.9. Blender does not crash, however, with Linux LTS 6.1.70. See also this [bug report](https://gitlab.freedesktop.org/drm/amd/-/issues/2991) for `amdgpu`.

Hi. Which ROCM version is this? If it's ROCm 6 based you need to try new builds after #116713 is merged.

Hi. Which ROCM version is this? If it's ROCm 6 based you need to try new builds after https://projects.blender.org/blender/blender/pulls/116713 is merged.
Blender Bot added
Status
Archived
and removed
Status
Needs Info from Developers
labels 2024-01-03 18:31:54 +01:00
Author

After trying 4th january daily builds (0b0e0601a14d), blender now crashes my whole system/X server. It appears my system has rocm 5.7.1 installed.

Edit : I forgot to mention this in my initial report but blender also crashes "in the same way" when switching Cycles from CPU to GPU

After trying 4th january daily builds (`0b0e0601a14d`), blender now crashes my whole system/X server. It appears my system has rocm 5.7.1 installed. Edit : I forgot to mention this in my initial report but blender also crashes "in the same way" when switching Cycles from CPU to GPU
MoreOwO reopened this issue 2024-01-04 14:19:13 +01:00
Blender Bot added
Status
Needs Triage
and removed
Status
Archived
labels 2024-01-04 14:19:14 +01:00
Jesse Yurkovich added
Status
Needs Info from Developers
and removed
Status
Needs Triage
labels 2024-01-04 19:57:29 +01:00

I am having the same problem too. Before the Jan. 4th updates for Manjaro, I had rocm 5.6.1-1(AUR version), and blender with Cycles HIP enabled using GPU compute and was working totally fine. No crashes.

After the updates, even if I had rocm 5.6.1-1, blender would crash. I get a "Memory access fault by GPU node-1" error with rocm 5.6.1-1 and for higher versions such as 5.7.1 and 6.0.0, both would freeze blender and even would freeze my entire PC requiring me to do a hard reset. I would see in the --debug-all output though that it would stop at "found precompiled kernels" before blender or my PC freezes.

I have the same GPU as the thread owner: AMD RX 6800XT.
Manjaro (Kernel 6.6.8-2)

I also tried lower kernels (6.5 and 6.1), and they would get "Memory access fault by GPU node-1" instead and then blender crashes.

I am having the same problem too. Before the Jan. 4th updates for Manjaro, I had rocm 5.6.1-1(AUR version), and blender with Cycles HIP enabled using GPU compute and was working totally fine. No crashes. After the updates, even if I had rocm 5.6.1-1, blender would crash. I get a "Memory access fault by GPU node-1" error with rocm 5.6.1-1 and for higher versions such as 5.7.1 and 6.0.0, both would freeze blender and even would freeze my entire PC requiring me to do a hard reset. I would see in the --debug-all output though that it would stop at "found precompiled kernels" before blender or my PC freezes. I have the same GPU as the thread owner: AMD RX 6800XT. Manjaro (Kernel 6.6.8-2) I also tried lower kernels (6.5 and 6.1), and they would get "Memory access fault by GPU node-1" instead and then blender crashes.

Hi guys. We need to control one variable changing to track down the issue. Was it an OS change or update to ROCm on your side? Or a new blender build.

Basically, when did it work before and then what change happened? (May be obvious to mention but we don't test every linux distro but hopefully can track this down).

Hi guys. We need to control one variable changing to track down the issue. Was it an OS change or update to ROCm on your side? Or a new blender build. Basically, when did it work before and then what change happened? (May be obvious to mention but we don't test every linux distro but hopefully can track this down).
Author

Switching to hip-runtime-amd 6.0.0-1 (is this the same as rocm ?) didn't fix the issue, still using the same 4th January daily build 0b0e0601a14d.
This time however, only blender crashed, not my entire system, I guess it must be linked to some particular cases.

Switching to hip-runtime-amd 6.0.0-1 (is this the same as rocm ?) didn't fix the issue, still using the same 4th January daily build `0b0e0601a14d`. This time however, only blender crashed, not my entire system, I guess it must be linked to some particular cases.

Yeah sorry let me be clear. The latest blender daily build should have changes that are necessary IF you have rocm 6 runtime. If you don't have the rocm 6 runtime any build of blender should be working.

If there's something that caused older builds of blender to stop with rocm 5.7 runtime I would like to know. Either way a good info would be to run blender from the command line with --debug-cycles and if you could set the environment variable AMD_LOG_LEVEL=3 and copy that info here would help. THanks!

Yeah sorry let me be clear. The latest blender daily build should have changes that are necessary IF you have rocm 6 runtime. If you don't have the rocm 6 runtime any build of blender should be working. If there's something that caused older builds of blender to stop with rocm 5.7 runtime I would like to know. Either way a good info would be to run blender from the command line with `--debug-cycles` and if you could set the environment variable `AMD_LOG_LEVEL=3` and copy that info here would help. THanks!

Here's --debug-cycles with AMD_LOG_LEVEL=3 for me. Might be a combination of dependency updates since it's a rolling release with Archlinux and Manjaro, which I also ran an update yesterday including Blender.

However, I didn't update hip-runtime-amd or any rocm related packages at all nor updated my Linux kernel. I used rocm 5.6.1-1 for the most part. It was working before I updated Blender and the other dependencies.

Running:
Manjaro 6.6.8.2
Blender 4.0.2-6
AMD RX 6800XT

EDIT: My bad. I didn't know you could attach files. Attached is the log.

Here's `--debug-cycles` with `AMD_LOG_LEVEL=3` for me. Might be a combination of dependency updates since it's a rolling release with Archlinux and Manjaro, which I also ran an update yesterday including Blender. However, I didn't update hip-runtime-amd or any rocm related packages at all nor updated my Linux kernel. I used rocm 5.6.1-1 for the most part. It was working before I updated Blender and the other dependencies. Running: Manjaro 6.6.8.2 Blender 4.0.2-6 AMD RX 6800XT EDIT: My bad. I didn't know you could attach files. Attached is the log.
Author

My debug output is quite small, i don't know if it's normal or not, but here it is

I0106 11:33:38.743681  4361 device.cpp:39] HIPEW initialization succeeded
I0106 11:33:38.743716  4361 device.cpp:41] Found precompiled kernels
:3:rocdevice.cpp            :445 : 0205186203 us: [pid:4361  tid:0x7f3f8d978580] Initializing HSA stack.

This is what's printed when crashing, but then nothing more.

My debug output is quite small, i don't know if it's normal or not, but here it is ``` I0106 11:33:38.743681 4361 device.cpp:39] HIPEW initialization succeeded I0106 11:33:38.743716 4361 device.cpp:41] Found precompiled kernels :3:rocdevice.cpp :445 : 0205186203 us: [pid:4361 tid:0x7f3f8d978580] Initializing HSA stack. ``` This is what's printed when crashing, but then nothing more.

Fedora Linux 40 (Prerelease)
Blender 4.0.3 Release Candidate f753cf14fc
rocm 6.0.0 (the same issue occurs with rocm 5.7)

Blender freezes for me rather than crashing

debug output (idk why it's removing the line breaks):

Read prefs: "/home/hopelesssoap/.config/blender/4.0/config/userpref.blend" I0106 15:29:29.952919 4295 device.cpp:37] HIPEW initialization succeeded I0106 15:29:29.953011 4295 device.cpp:39] Found precompiled kernels

system logs attached

Fedora Linux 40 (Prerelease) Blender 4.0.3 Release Candidate f753cf14fca2 rocm 6.0.0 (the same issue occurs with rocm 5.7) Blender freezes for me rather than crashing debug output (idk why it's removing the line breaks): ` Read prefs: "/home/hopelesssoap/.config/blender/4.0/config/userpref.blend" I0106 15:29:29.952919 4295 device.cpp:37] HIPEW initialization succeeded I0106 15:29:29.953011 4295 device.cpp:39] Found precompiled kernels ` system logs attached
597 B

Update: It turns out it is a problem with Manjaro/ArchLinux repository's version of blender.

Just tried today's release candidate in the build, and it's working on my end. No crashes.

I was working on the donut tutorial from BlenderGuru.

Update: It turns out it is a problem with Manjaro/ArchLinux repository's version of blender. Just tried today's release candidate in the build, and it's working on my end. No crashes. I was working on the donut tutorial from BlenderGuru.

Arch Linux Blender maintainer here. I wonder what the issue is with our build, though. We try to follow upstream as best we can.

Arch Linux Blender maintainer here. I wonder what the issue is with our build, though. We try to follow upstream as best we can.

I'm using today's release canidate, and I am still getting the issue

I'm using today's release canidate, and I am still getting the issue

Solus ROCm maintainer here. I'm puzzled by this issue too. I'm on ROCm 6.0, RX6600M (gfx1032, tried emulating as gfx1030 as well). I built commit f753cf14fc on Solus. When rendering BMW27 it renders fine, but when it renders the classroom demo, I would get the below error:

Full logs attached.
Fra:0 Mem:401.74M (Peak 401.74M) | Time:00:00.84 | Mem:270.51M, Peak:270.51M | _mainScene, interior | Updating Lights
I0107 19:52:30.505620 26784 light.cpp:1408] Total 10 lights.
I0107 19:52:30.505630 26784 light.cpp:1388] Number of lights sent to the device: 10
Fra:0 Mem:401.74M (Peak 401.74M) | Time:00:00.84 | Mem:270.51M, Peak:270.51M | _mainScene, interior | Updating Lights | Computing distribution
I0107 19:52:30.506178 26784 light.cpp:331] Use light distribution with 80658 emitters.
Fra:0 Mem:402.97M (Peak 402.97M) | Time:00:00.84 | Mem:271.74M, Peak:271.74M | _mainScene, interior | Updating Integrator
Fra:0 Mem:404.97M (Peak 404.97M) | Time:00:00.84 | Mem:273.74M, Peak:273.74M | _mainScene, interior | Updating Film
Fra:0 Mem:404.98M (Peak 405.06M) | Time:00:00.84 | Mem:273.66M, Peak:273.74M | _mainScene, interior | Updating Lookup Tables
I0107 19:52:30.512387 26784 tables.cpp:39] Total 10 lookup tables.
Fra:0 Mem:404.98M (Peak 405.06M) | Time:00:00.84 | Mem:273.75M, Peak:273.75M | _mainScene, interior | Updating Baking
Fra:0 Mem:404.98M (Peak 405.06M) | Time:00:00.84 | Mem:273.75M, Peak:273.75M | _mainScene, interior | Updating Device | Writing constant memory
I0107 19:52:30.512501 26784 scene.cpp:353] System memory statistics after full device sync:
  Usage: 340,548,072 (324.77M)
  Peak: 340,637,160 (324.86M)
Fra:0 Mem:406.75M (Peak 406.75M) | Time:00:00.85 | Mem:438.25M, Peak:438.25M | _mainScene, interior | Sample 0/300
I0107 19:52:30.754382 26784 path_trace.cpp:409] Rendered 1 samples in 0.0970092 seconds (0.0970092 seconds per sample), occupancy: 0.262978
Fra:0 Mem:738.98M (Peak 738.98M) | Time:00:01.09 | Remaining:01:11.50 | Mem:770.48M, Peak:770.48M | _mainScene, interior | Sample 1/300
Memory access fault by GPU node-1 (Agent handle: 0x7f3c8fa47a00) on address 0x7f3953d91000. Reason: Page not present or supervisor privilege.
fish: Job 1, 'HSA_OVERRIDE_GFX_VERSION=10.3.0…' terminated by signal SIGABRT (Abort)

gdb output:

Core was generated by `blender --debug --debug-cycles -b classroom/classroom.blend -f 0 -- --cycles-de'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at pthread_kill.c:44

warning: 44	pthread_kill.c: No such file or directory
[Current thread is 1 (Thread 0x7f5047bff680 (LWP 30126))]
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at pthread_kill.c:89
#3  0x00007f50ac645196 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007f50ac6298bf in __GI_abort () at abort.c:100
#5  0x00007f50480c5302 in rocr::core::Runtime::VMFaultHandler (val=<optimized out>, arg=<optimized out>) at /home/build/YPKG/root/rocm-runtime/build/ROCR-Runtime-rocm-6.0.0/src/core/runtime/runtime.cpp:1429
#6  0x00007f50480c3ccb in rocr::core::Runtime::AsyncEventsLoop () at /home/build/YPKG/root/rocm-runtime/build/ROCR-Runtime-rocm-6.0.0/src/core/runtime/runtime.cpp:1148
#7  0x00007f504807a75a in rocr::os::ThreadTrampoline (arg=<optimized out>) at /home/build/YPKG/root/rocm-runtime/build/ROCR-Runtime-rocm-6.0.0/src/core/util/lnx/os_linux.cpp:80
#8  0x00007f50ac69a10a in start_thread (arg=<optimized out>) at pthread_create.c:444
#9  0x00007f50ac727a8c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

However, if I download the Blender daily builds of f753cf14fc, it runs completely fine. Perhaps I have a different build flag, but I can't think of any I'm specifying that would cause this error. Here is the build recipe.

What ROCm version is the official binaries using? Edit: seems to be ROCm 5.5, that would explain why the official binaries do't crash

Also @Atticus-Finch and anyone else in a similar situation: if you're getting freezed Blender instead of crashing, you probably need https://github.com/ROCm/ROCm/issues/2596

Solus ROCm maintainer here. I'm puzzled by this issue too. I'm on ROCm 6.0, RX6600M (gfx1032, tried emulating as gfx1030 as well). I built commit f753cf14fca2 on Solus. When rendering BMW27 it renders fine, but when it renders the classroom demo, I would get the below error: <details> Full logs attached. ``` Fra:0 Mem:401.74M (Peak 401.74M) | Time:00:00.84 | Mem:270.51M, Peak:270.51M | _mainScene, interior | Updating Lights I0107 19:52:30.505620 26784 light.cpp:1408] Total 10 lights. I0107 19:52:30.505630 26784 light.cpp:1388] Number of lights sent to the device: 10 Fra:0 Mem:401.74M (Peak 401.74M) | Time:00:00.84 | Mem:270.51M, Peak:270.51M | _mainScene, interior | Updating Lights | Computing distribution I0107 19:52:30.506178 26784 light.cpp:331] Use light distribution with 80658 emitters. Fra:0 Mem:402.97M (Peak 402.97M) | Time:00:00.84 | Mem:271.74M, Peak:271.74M | _mainScene, interior | Updating Integrator Fra:0 Mem:404.97M (Peak 404.97M) | Time:00:00.84 | Mem:273.74M, Peak:273.74M | _mainScene, interior | Updating Film Fra:0 Mem:404.98M (Peak 405.06M) | Time:00:00.84 | Mem:273.66M, Peak:273.74M | _mainScene, interior | Updating Lookup Tables I0107 19:52:30.512387 26784 tables.cpp:39] Total 10 lookup tables. Fra:0 Mem:404.98M (Peak 405.06M) | Time:00:00.84 | Mem:273.75M, Peak:273.75M | _mainScene, interior | Updating Baking Fra:0 Mem:404.98M (Peak 405.06M) | Time:00:00.84 | Mem:273.75M, Peak:273.75M | _mainScene, interior | Updating Device | Writing constant memory I0107 19:52:30.512501 26784 scene.cpp:353] System memory statistics after full device sync: Usage: 340,548,072 (324.77M) Peak: 340,637,160 (324.86M) Fra:0 Mem:406.75M (Peak 406.75M) | Time:00:00.85 | Mem:438.25M, Peak:438.25M | _mainScene, interior | Sample 0/300 I0107 19:52:30.754382 26784 path_trace.cpp:409] Rendered 1 samples in 0.0970092 seconds (0.0970092 seconds per sample), occupancy: 0.262978 Fra:0 Mem:738.98M (Peak 738.98M) | Time:00:01.09 | Remaining:01:11.50 | Mem:770.48M, Peak:770.48M | _mainScene, interior | Sample 1/300 Memory access fault by GPU node-1 (Agent handle: 0x7f3c8fa47a00) on address 0x7f3953d91000. Reason: Page not present or supervisor privilege. fish: Job 1, 'HSA_OVERRIDE_GFX_VERSION=10.3.0…' terminated by signal SIGABRT (Abort) ``` gdb output: ``` Core was generated by `blender --debug --debug-cycles -b classroom/classroom.blend -f 0 -- --cycles-de'. Program terminated with signal SIGABRT, Aborted. #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at pthread_kill.c:44 warning: 44 pthread_kill.c: No such file or directory [Current thread is 1 (Thread 0x7f5047bff680 (LWP 30126))] (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78 #2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at pthread_kill.c:89 #3 0x00007f50ac645196 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007f50ac6298bf in __GI_abort () at abort.c:100 #5 0x00007f50480c5302 in rocr::core::Runtime::VMFaultHandler (val=<optimized out>, arg=<optimized out>) at /home/build/YPKG/root/rocm-runtime/build/ROCR-Runtime-rocm-6.0.0/src/core/runtime/runtime.cpp:1429 #6 0x00007f50480c3ccb in rocr::core::Runtime::AsyncEventsLoop () at /home/build/YPKG/root/rocm-runtime/build/ROCR-Runtime-rocm-6.0.0/src/core/runtime/runtime.cpp:1148 #7 0x00007f504807a75a in rocr::os::ThreadTrampoline (arg=<optimized out>) at /home/build/YPKG/root/rocm-runtime/build/ROCR-Runtime-rocm-6.0.0/src/core/util/lnx/os_linux.cpp:80 #8 0x00007f50ac69a10a in start_thread (arg=<optimized out>) at pthread_create.c:444 #9 0x00007f50ac727a8c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 ``` </details> However, if I download the Blender daily builds of f753cf14fca2, it runs completely fine. Perhaps I have a different build flag, but I can't think of any I'm specifying that would cause this error. [Here](https://github.com/GZGavinZhao/packages/blob/caafdff7eada562308e9cae41311e5dc0bdba6b9/packages/b/blender/package.yml) is the build recipe. What ROCm version is the official binaries using? Edit: seems to be ROCm 5.5, that would explain why the official binaries do't crash Also @Atticus-Finch and anyone else in a similar situation: if you're getting freezed Blender instead of crashing, you probably need https://github.com/ROCm/ROCm/issues/2596

There seems to be a wide variety of issues appearing with ROCm 5.5+, Blender, and the kernel, so here is what I've gathered. Also posted in https://github.com/ROCm/ROCm/issues/2596. Hopefully this will help guide people to some existing solutions.

  1. You do something and your entire computer freezes
  2. Blender cannot even find HIP (hipInit failed)
    • Kernel >= 6.6: go to https://gitlab.freedesktop.org/drm/amd/-/issues/2991 and apply the revert patch mentioned there
    • ROCm >= 6: if neither libamdhip64.so.5 nor libamdhip64.so exists on your computer (usually at $ROCM_PATH/lib64 or $ROCM_PATH/lib), try installing the package that provides that file or manually create a symlink to libamdhip64.so.6
    • Else: open an issue
  3. Your whatever program (Blender, PyTorch, etc.) can find and select your GPU but crashes when running/rendering
    • ROCm >= 6:
      • SIGSEV, assertion error, "ShouldNotReachHere", "HIP out of memory": try any Blender version that contains commit d2e91fb, as of 2024-01-08 the official daily builds provides 4.0.3 RC which contains the commit
      • SIGABORT, GPU memory access fault, dmesg shows GCVM_L2_PROTECTION_FAULT_STATUS: the Blender daily builds mentioned above also fix this problem, but this issue is more severe than above because d2e91fb doesn't fix this issue with ROCm 6; through my investigation it seems like the only reason the official binaries work is because the fatbins are compiled with ROCm 5.5. If you're on a distribution that has ROCm 6 and they compiled their Blender with their ROCm 6, it's unlikely that their Blender would work.
    • ROCm < 6: try the 4.0.3 RC daily builds; if that doesn't work, open an issue
There seems to be a wide variety of issues appearing with ROCm 5.5+, Blender, and the kernel, so here is what I've gathered. Also posted in https://github.com/ROCm/ROCm/issues/2596. Hopefully this will help guide people to some existing solutions. 1. You do something and your entire computer freezes - Kernel >= 6.6: go to https://gitlab.freedesktop.org/drm/amd/-/issues/2991 and apply the revert patch mentioned there - Else: open an issue 2. Blender cannot even find HIP (hipInit failed) - Kernel >= 6.6: go to https://gitlab.freedesktop.org/drm/amd/-/issues/2991 and apply the revert patch mentioned there - ROCm >= 6: if neither `libamdhip64.so.5` nor `libamdhip64.so` exists on your computer (usually at `$ROCM_PATH/lib64` or `$ROCM_PATH/lib`), try installing the package that provides that file or manually create a symlink to `libamdhip64.so.6` - Else: open an issue 3. Your whatever program (Blender, PyTorch, etc.) can find and select your GPU but crashes when running/rendering - ROCm >= 6: - SIGSEV, assertion error, "ShouldNotReachHere", "HIP out of memory": try any Blender version that contains commit d2e91fb, as of 2024-01-08 the official daily builds provides 4.0.3 RC which contains the commit - SIGABORT, GPU memory access fault, `dmesg` shows `GCVM_L2_PROTECTION_FAULT_STATUS`: the Blender daily builds mentioned above also fix this problem, but this issue is more severe than above because d2e91fb doesn't fix this issue with ROCm 6; through my investigation it seems like the only reason the official binaries work is because the fatbins are compiled with ROCm 5.5. If you're on a distribution that has ROCm 6 and they compiled their Blender with their ROCm 6, it's unlikely that their Blender would work. - ROCm < 6: try the 4.0.3 RC daily builds; if that doesn't work, open an issue

There seems to be a wide variety of issues appearing with ROCm 5.5+, Blender, and the kernel, so here is what I've gathered. Also posted in https://github.com/ROCm/ROCm/issues/2596. Hopefully this will help guide people to some existing solutions.

  1. You do something and your entire computer freezes
  2. Blender cannot even find HIP (hipInit failed)
    • Kernel >= 6.6: go to https://gitlab.freedesktop.org/drm/amd/-/issues/2991 and apply the revert patch mentioned there
    • ROCm >= 6: if neither libamdhip64.so.5 nor libamdhip64.so exists on your computer (usually at $ROCM_PATH/lib64 or $ROCM_PATH/lib), try installing the package that provides that file or manually create a symlink to libamdhip64.so.6
    • Else: open an issue
  3. Your whatever program (Blender, PyTorch, etc.) can find and select your GPU but crashes when running/rendering
    • ROCm >= 6:
      • SIGSEV, assertion error, "ShouldNotReachHere", "HIP out of memory": try any Blender version that contains commit d2e91fb, as of 2024-01-08 the official daily builds provides 4.0.3 RC which contains the commit
      • SIGABORT, GPU memory access fault, dmesg shows GCVM_L2_PROTECTION_FAULT_STATUS: the Blender daily builds mentioned above also fix this problem, but this issue is more severe than above because d2e91fb doesn't fix this issue with ROCm 6; through my investigation it seems like the only reason the official binaries work is because the fatbins are compiled with ROCm 5.5. If you're on a distribution that has ROCm 6 and they compiled their Blender with their ROCm 6, it's unlikely that their Blender would work.
    • ROCm < 6: try the 4.0.3 RC daily builds; if that doesn't work, open an issue

Thanks for the extensive comment. However, any ROCM6 installation should have libamdhip64.so.5, or so I'm told. Is that not the case? And certainly should have a libamdhip64.so installed as part of the ROCM6 installation.

> There seems to be a wide variety of issues appearing with ROCm 5.5+, Blender, and the kernel, so here is what I've gathered. Also posted in https://github.com/ROCm/ROCm/issues/2596. Hopefully this will help guide people to some existing solutions. > > 1. You do something and your entire computer freezes > - Kernel >= 6.6: go to https://gitlab.freedesktop.org/drm/amd/-/issues/2991 and apply the revert patch mentioned there > - Else: open an issue > 2. Blender cannot even find HIP (hipInit failed) > - Kernel >= 6.6: go to https://gitlab.freedesktop.org/drm/amd/-/issues/2991 and apply the revert patch mentioned there > - ROCm >= 6: if neither `libamdhip64.so.5` nor `libamdhip64.so` exists on your computer (usually at `$ROCM_PATH/lib64` or `$ROCM_PATH/lib`), try installing the package that provides that file or manually create a symlink to `libamdhip64.so.6` > - Else: open an issue > 3. Your whatever program (Blender, PyTorch, etc.) can find and select your GPU but crashes when running/rendering > - ROCm >= 6: > - SIGSEV, assertion error, "ShouldNotReachHere", "HIP out of memory": try any Blender version that contains commit d2e91fb, as of 2024-01-08 the official daily builds provides 4.0.3 RC which contains the commit > - SIGABORT, GPU memory access fault, `dmesg` shows `GCVM_L2_PROTECTION_FAULT_STATUS`: the Blender daily builds mentioned above also fix this problem, but this issue is more severe than above because d2e91fb doesn't fix this issue with ROCm 6; through my investigation it seems like the only reason the official binaries work is because the fatbins are compiled with ROCm 5.5. If you're on a distribution that has ROCm 6 and they compiled their Blender with their ROCm 6, it's unlikely that their Blender would work. > - ROCm < 6: try the 4.0.3 RC daily builds; if that doesn't work, open an issue Thanks for the extensive comment. However, any ROCM6 installation should have libamdhip64.so.5, or so I'm told. Is that not the case? And certainly should have a libamdhip64.so installed as part of the ROCM6 installation.

@BrianSavery For that part I'm mostly talking about downstream distributions. It is common for distributions to split out *.so files into a separate development package (e.g. if the package rocm-hip provides libamdhip64.so.6 and libamdhip64.so.6.0.32830, then it is common for the package rocm-hip-devel to provide the file libamdhip64.so). In addition, a libamdhip64.so.5 symlink seems to be manually created when packaged for the official AMD repositories; this symlink is not created if I build from source, only the symlink libamdhip64.so.6 is created, therefore it is likely for distributions packages to miss this symlink. I package ROCm for Solus and I didn't even know this symlink existed without your comment!

@BrianSavery For that part I'm mostly talking about downstream distributions. It is common for distributions to split out `*.so` files into a separate development package (e.g. if the package `rocm-hip` provides `libamdhip64.so.6` and `libamdhip64.so.6.0.32830`, then it is common for the package `rocm-hip-devel` to provide the file `libamdhip64.so`). In addition, a `libamdhip64.so.5` symlink seems to be manually created when packaged for the official AMD repositories; this symlink is not created if I build from source, only the symlink `libamdhip64.so.6` is created, therefore it is likely for distributions packages to miss this symlink. I package ROCm for Solus and I didn't even know this symlink existed without your comment!

@BrianSavery For that part I'm mostly talking about downstream distributions. It is common for distributions to split out *.so files into a separate development package (e.g. if the package rocm-hip provides libamdhip64.so.6 and libamdhip64.so.6.0.32830, then it is common for the package rocm-hip-devel to provide the file libamdhip64.so). In addition, a libamdhip64.so.5 symlink seems to be manually created when packaged for the official AMD repositories; this symlink is not created if I build from source, only the symlink libamdhip64.so.6 is created, therefore it is likely for distributions packages to miss this symlink. I package ROCm for Solus and I didn't even know this symlink existed without your comment!

Ooooof. That is a confusing circumstance and I feel for users. All I can say here from my side is that Blender is looking for libamdhip64.so or libamdhip.so.5 in these locations (note a libamdhip64.so.6 symlinked to libamdhip64.so is fine!):

1229ffa859/extern/hipew/src/hipew.c (L244-L249)

The official AMD distributions should put them in the right place, and I think any rocm-hip ones should too? (depending on which package repo?)

> @BrianSavery For that part I'm mostly talking about downstream distributions. It is common for distributions to split out `*.so` files into a separate development package (e.g. if the package `rocm-hip` provides `libamdhip64.so.6` and `libamdhip64.so.6.0.32830`, then it is common for the package `rocm-hip-devel` to provide the file `libamdhip64.so`). In addition, a `libamdhip64.so.5` symlink seems to be manually created when packaged for the official AMD repositories; this symlink is not created if I build from source, only the symlink `libamdhip64.so.6` is created, therefore it is likely for distributions packages to miss this symlink. I package ROCm for Solus and I didn't even know this symlink existed without your comment! Ooooof. That is a confusing circumstance and I feel for users. All I can say here from my side is that Blender is looking for libamdhip64.so or libamdhip.so.5 in these locations (note a libamdhip64.so.6 symlinked to libamdhip64.so is fine!): https://projects.blender.org/blender/blender/src/commit/1229ffa85985aa68bd2e03ba3b32cc8f2c095dc8/extern/hipew/src/hipew.c#L244-L249 The official AMD distributions should put them in the right place, and I think any rocm-hip ones should too? (depending on which package repo?)

The official AMD distributions should put them in the right place, and I think any rocm-hip ones should too? (depending on which package repo?)

Yes, I believe the official AMD distributions put them in the right place. My note on libamdhip64.so.6 is mostly just me covering the very corner cases, where a distro packaged ROCm 6 but split out libamdhip64.so into a development package and didn't mark the development package as a runtime dependency of blender.

> The official AMD distributions should put them in the right place, and I think any rocm-hip ones should too? (depending on which package repo?) Yes, I believe the official AMD distributions put them in the right place. My note on `libamdhip64.so.6` is mostly just me covering the very corner cases, where a distro packaged ROCm 6 but split out `libamdhip64.so` into a development package and didn't mark the development package as a runtime dependency of `blender`.
Author

After testing regularly blender builds, blender was finally able to access to show me system settings and switch to GPU Compute of cycles on build 6cc80f1213d7, I've also upgraded rocm-llvm and hip-runtime, so it might be linked. However, blender is still crashing when trying to render anything, with an error this time.

blender: /usr/src/debug/hip-runtime-amd/clr-rocm-6.0.0/hipamd/src/hip_memory.cpp:2194: hipError_t ihipGetMemcpyParam3DCommand(amd::Command*&, const HIP_MEMCPY3D*, hip::Stream*): Assertion `false && "ShouldNotReachHere()"' failed.
zsh: IOT instruction (core dumped)  ./blender

I can create another issue if this is now considered fixed

After testing regularly blender builds, blender was finally able to access to show me system settings and switch to GPU Compute of cycles on build `6cc80f1213d7`, I've also upgraded rocm-llvm and hip-runtime, so it might be linked. However, blender is still crashing when trying to render anything, with an error this time. ``` blender: /usr/src/debug/hip-runtime-amd/clr-rocm-6.0.0/hipamd/src/hip_memory.cpp:2194: hipError_t ihipGetMemcpyParam3DCommand(amd::Command*&, const HIP_MEMCPY3D*, hip::Stream*): Assertion `false && "ShouldNotReachHere()"' failed. zsh: IOT instruction (core dumped) ./blender ``` I can create another issue if this is now considered fixed

For me, this is an ongoing issue. I've tried many of the above suggestions and I'll admit some are above my understanding, after making symlinks to different libs, etc. Here is my latest error log with AMD_LOG_LEVEL=3 set. I've got all the latest rocm libs installed from the various packages described above. Reverting the a 6.5.6 kernel didn't help, either. I hope this helps in finding a practical and lasting fix.

For me, this is an ongoing issue. I've tried many of the above suggestions and I'll admit some are above my understanding, after making symlinks to different libs, etc. Here is my latest error log with AMD_LOG_LEVEL=3 set. I've got all the latest rocm libs installed from the various packages described above. Reverting the a 6.5.6 kernel didn't help, either. I hope this helps in finding a practical and lasting fix.

@HaigPetrus Your issue is caused by the HIP runtime complaining that the code to run on gfx90c hardware is not found (which likely comes from your iGPU), even though in reality you may not need it if you want to render with your dGPU. Per https://github.com/ROCm/clr/issues/51#issuecomment-1916938569, this is sort of an "intended behavior".

A manual/temporary fix is to set the environment HIP_VISIBLE_DEVICES=<device-id-to-compile-with>. You can find the device id of the desired device by running rocm-smi.

The solution on Blender's side is, when creating release binaries, to simply compile against more architectures to cover as many GPU models as possible. Or if you're running Blender from your distro's repository, ask the maintainer of Blender to add gfx90c to the list of GPUs to compile for. On Solus, we set CYCLES_HIP_BINARIES_ARCH to gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102;gfx90c;gfx902;gfx1011;gfx1012;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035;gfx1036;gfx1103. On hindsight, to support most "modern" AMD GPUs, this is probably not enough, so ideally it should contain everything that starts with gfx9, gfx103, gfx101, and gfx11, so something like gfx803;gfx900;gfx902;gfx904;gfx906;gfx908;gfx90a;gfx90c;gfx940;gfx941;gfx942;gfx1010;gfx1011;gfx1012;gfx1013;gfx1030;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035;gfx1036;gfx1100;gfx1101;gfx1102;gfx1103.

@HaigPetrus Your issue is caused by the HIP runtime complaining that the code to run on `gfx90c` hardware is not found (which likely comes from your iGPU), even though in reality you may not need it if you want to render with your dGPU. Per https://github.com/ROCm/clr/issues/51#issuecomment-1916938569, this is sort of an "intended behavior". A manual/temporary fix is to set the environment `HIP_VISIBLE_DEVICES=<device-id-to-compile-with>`. You can find the device id of the desired device by running `rocm-smi`. The solution on Blender's side is, when creating release binaries, to simply compile against more architectures to cover as many GPU models as possible. Or if you're running Blender from your distro's repository, ask the maintainer of Blender to add `gfx90c` to the list of GPUs to compile for. On Solus, we set `CYCLES_HIP_BINARIES_ARCH` to `gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102;gfx90c;gfx902;gfx1011;gfx1012;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035;gfx1036;gfx1103`. On hindsight, to support most "modern" AMD GPUs, this is probably not enough, so ideally it should contain everything that starts with `gfx9`, `gfx103`, `gfx101`, and `gfx11`, so something like `gfx803;gfx900;gfx902;gfx904;gfx906;gfx908;gfx90a;gfx90c;gfx940;gfx941;gfx942;gfx1010;gfx1011;gfx1012;gfx1013;gfx1030;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035;gfx1036;gfx1100;gfx1101;gfx1102;gfx1103`.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
9 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#116697
No description provided.