Cycles HIP issues on Debian #102018

Closed
opened 2022-10-23 16:13:49 +02:00 by Jakub Jaszewski · 37 comments

System Information
Operating system: Linux-6.0.0-1-rt-amd64-x86_64-with-glibc2.35
Devuan GNU/Linux 5 (daedalus/ceres) x86_64 (aside from init system differences this distribution can be treated as Debian sid)
Graphics card: Vega 20 [Radeon VII]
system-info.txt
dmesg
lshw

Blender Version
Broken: 3.4.0 Alpha, branch: master, commit date: 2022-10-21 11:11, hash: 26f181c6b7

Short description of error
As of recently Debian is packaging ROCm drivers for AMD GPUs and version 5.2.3-1 just landed in unstable (sid). This version as per official AMD ROCm release notes is an equivalent of 22.20.1.
One of the differences between official AMD's version and Debian's is ROCm installation path /opt/rocm/ versus /usr/lib/x86_64-linux-gnu/ respectively:

sudo find / -iname "libhsa*"

/usr/share/doc/libhsakmt1
/usr/share/doc/libhsa-runtime-dev
/usr/share/doc/libhsa-runtime64-1
/usr/lib/x86_64-linux-gnu/libhsa-runtime64.so
/usr/lib/x86_64-linux-gnu/libhsa-runtime64.so.1.5.0
/usr/lib/x86_64-linux-gnu/libhsa-runtime64.so.1
/usr/lib/x86_64-linux-gnu/libhsakmt.so.1
/usr/lib/x86_64-linux-gnu/libhsakmt.so.1.0.6

sudo find / -iname "libamdhip64*"

/usr/share/doc/libamdhip64-dev
/usr/share/doc/libamdhip64-5
/usr/lib/x86_64-linux-gnu/libamdhip64.so.5
/usr/lib/x86_64-linux-gnu/libamdhip64.so
/usr/lib/x86_64-linux-gnu/libamdhip64.so.5.2.21153-

When ROCm is installed from Debian's repositories Blender does not detect AMD GPU even thou the ROCm platform is validated with AMD's tools.
https://docs.amd.com/bundle/ROCm_Installation_Guidev5.0/page/How_To_Install_ROCm.html#_Installation_Methods

rocminfo and hipconfig outputs with both GPUs detected:
rocminfo
hipconfig
And how it looks in package manager and Blender:

da064dd921b0172d0eeabd68052f030c27636a33.png

I suspect this is caused by ROCm paths being different between Blender libraries and HIP installed from Debian's packages, but I could be very much wrong.
Please consider this report as void if this is a mistake on my side. I'm reporting this here only because this installation differs from the official one and ROCm is a very new package in Debian.

Exact steps for others to reproduce the error
On Debian unstable (sid) Linux distribution:

  • install rocm packages from apt
  • validate install with rocminfo and hipconfig
  • open Blender and check if there are any devices detected in Preferences -> System -> HIP

Additional info:
ROCm packaging is done by Debian AI team:
https://lists.debian.org/debian-ai/2022/10/
https://salsa.debian.org/rocm-team/community/team-project

**System Information** Operating system: Linux-6.0.0-1-rt-amd64-x86_64-with-glibc2.35 Devuan GNU/Linux 5 (daedalus/ceres) x86_64 (aside from init system differences this distribution can be treated as Debian sid) Graphics card: Vega 20 [Radeon VII] [system-info.txt](https://archive.blender.org/developer/F13753391/system-info.txt) [dmesg](https://archive.blender.org/developer/F13753431/dmesg) [lshw](https://archive.blender.org/developer/F13753434/lshw) **Blender Version** Broken: 3.4.0 Alpha, branch: master, commit date: 2022-10-21 11:11, hash: 26f181c6b7b2 **Short description of error** As of recently Debian is packaging ROCm drivers for AMD GPUs and version 5.2.3-1 just landed in unstable (sid). This version as per official AMD ROCm release notes is an equivalent of 22.20.1. One of the differences between official AMD's version and Debian's is ROCm installation path `/opt/rocm/` versus `/usr/lib/x86_64-linux-gnu/` respectively: ``` sudo find / -iname "libhsa*" /usr/share/doc/libhsakmt1 /usr/share/doc/libhsa-runtime-dev /usr/share/doc/libhsa-runtime64-1 /usr/lib/x86_64-linux-gnu/libhsa-runtime64.so /usr/lib/x86_64-linux-gnu/libhsa-runtime64.so.1.5.0 /usr/lib/x86_64-linux-gnu/libhsa-runtime64.so.1 /usr/lib/x86_64-linux-gnu/libhsakmt.so.1 /usr/lib/x86_64-linux-gnu/libhsakmt.so.1.0.6 sudo find / -iname "libamdhip64*" /usr/share/doc/libamdhip64-dev /usr/share/doc/libamdhip64-5 /usr/lib/x86_64-linux-gnu/libamdhip64.so.5 /usr/lib/x86_64-linux-gnu/libamdhip64.so /usr/lib/x86_64-linux-gnu/libamdhip64.so.5.2.21153- ``` When ROCm is installed from Debian's repositories Blender does not detect AMD GPU even thou the ROCm platform is validated with AMD's tools. https://docs.amd.com/bundle/ROCm_Installation_Guidev5.0/page/How_To_Install_ROCm.html#_Installation_Methods rocminfo and hipconfig outputs with both GPUs detected: [rocminfo](https://archive.blender.org/developer/F13753389/rocminfo) [hipconfig](https://archive.blender.org/developer/F13753433/hipconfig) And how it looks in package manager and Blender: ![da064dd921b0172d0eeabd68052f030c27636a33.png](https://archive.blender.org/developer/F13753428/da064dd921b0172d0eeabd68052f030c27636a33.png) I suspect this is caused by ROCm paths being different between Blender libraries and HIP installed from Debian's packages, but I could be very much wrong. Please consider this report as void if this is a mistake on my side. I'm reporting this here only because this installation differs from the official one and ROCm is a very new package in Debian. **Exact steps for others to reproduce the error** On Debian unstable (sid) Linux distribution: - install rocm packages from apt - validate install with `rocminfo` and `hipconfig` - open Blender and check if there are any devices detected in `Preferences` -> `System` -> `HIP` Additional info: ROCm packaging is done by Debian AI team: https://lists.debian.org/debian-ai/2022/10/ https://salsa.debian.org/rocm-team/community/team-project

Added subscriber: @silex

Added subscriber: @silex

#102158 was marked as duplicate of this issue

#102158 was marked as duplicate of this issue
Member

Added subscribers: @brecht, @OmarEmaraDev

Added subscribers: @brecht, @OmarEmaraDev
Member

Changed status from 'Needs Triage' to: 'Needs Developer To Reproduce'

Changed status from 'Needs Triage' to: 'Needs Developer To Reproduce'
Member

Looks like Blender only searched for /opt/rocm/hip/lib/libamdhip64.so in hipewHipInit? Not sure how this works.
@brecht Is this expected?

Looks like Blender only searched for `/opt/rocm/hip/lib/libamdhip64.so` in `hipewHipInit`? Not sure how this works. @brecht Is this expected?

Added subscribers: @Sayak-Biswas, @BrianSavery

Added subscribers: @Sayak-Biswas, @BrianSavery

CC @Sayak-Biswas @bsavery.

I guess it should also look for libamdhip64.so anywhere in the library path instead of just a fixed location? Something like:

diff --git a/extern/hipew/src/hipew.c b/extern/hipew/src/hipew.c
index ecf952e..7cafe77 100644
--- a/extern/hipew/src/hipew.c
+++ b/extern/hipew/src/hipew.c
@@ -253,7 +253,7 @@ static int hipewHipInit(void) {
   /* Default installation path. */
   const char *hip_paths[] = {"", NULL};
 #else
-  const char *hip_paths[] = {"/opt/rocm/hip/lib/libamdhip64.so", NULL};
+  const char *hip_paths[] = {"libamdhip64.so", "/opt/rocm/hip/lib/libamdhip64.so", NULL};
 #endif
   static int initialized = 0;
   static int result = 0;
CC @Sayak-Biswas @bsavery. I guess it should also look for `libamdhip64.so` anywhere in the library path instead of just a fixed location? Something like: ``` diff --git a/extern/hipew/src/hipew.c b/extern/hipew/src/hipew.c index ecf952e..7cafe77 100644 --- a/extern/hipew/src/hipew.c +++ b/extern/hipew/src/hipew.c @@ -253,7 +253,7 @@ static int hipewHipInit(void) { /* Default installation path. */ const char *hip_paths[] = {"", NULL}; #else - const char *hip_paths[] = {"/opt/rocm/hip/lib/libamdhip64.so", NULL}; + const char *hip_paths[] = {"libamdhip64.so", "/opt/rocm/hip/lib/libamdhip64.so", NULL}; #endif static int initialized = 0; static int result = 0; ```

Changed status from 'Needs Developer To Reproduce' to: 'Confirmed'

Changed status from 'Needs Developer To Reproduce' to: 'Confirmed'

Added subscribers: @jmcelroy, @deadpin

Added subscribers: @jmcelroy, @deadpin

From the merged in bug: openSUSE ROCm binaries are being installed into both /opt/rocm-5.3.0/hip/lib/, and one in /opt/rocm-5.3.0/lib/. Hopefully, once this is fixed those locations will be ok too.

From the merged in bug: openSUSE ROCm binaries are being installed into both `/opt/rocm-5.3.0/hip/lib/`, and one in `/opt/rocm-5.3.0/lib/`. Hopefully, once this is fixed those locations will be ok too.

This issue was referenced by af7dd99588

This issue was referenced by af7dd995880dcc88f9e590e12cda20b385471910

This issue was referenced by f66236a827

This issue was referenced by f66236a827c82bffd9f31ca2a7919e865a0397e0

Changed status from 'Confirmed' to: 'Resolved'

Changed status from 'Confirmed' to: 'Resolved'
Brecht Van Lommel self-assigned this 2022-11-01 18:37:05 +01:00

I'm not able to test this myself, so if it doesn't work please let me know.

I think the libraries should be found now in /usr/lib/x86_64-linux-gnu/ as in the original report.

I'm not able to test this myself, so if it doesn't work please let me know. I think the libraries should be found now in `/usr/lib/x86_64-linux-gnu/` as in the original report.

@brecht Thanks for working on this.
With the patch Blender crashes in mesa after navigating to Edit -> Preferences -> System -> HIP.
There is no crash log file unfortunately, but debugging shows that HIP is initialized:

I1102 11:33:38.655006 91168 device.cpp:32] HIPEW initialization succeeded
I1102 11:33:38.655035 91168 device.cpp:34] Found precompiled kernels
mesa: CommandLine Error: Option 'h' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
Aborted

@brecht Thanks for working on this. With the patch Blender crashes in mesa after navigating to `Edit` -> `Preferences` -> `System` -> `HIP`. There is no crash log file unfortunately, but debugging shows that HIP is initialized: ``` I1102 11:33:38.655006 91168 device.cpp:32] HIPEW initialization succeeded I1102 11:33:38.655035 91168 device.cpp:34] Found precompiled kernels mesa: CommandLine Error: Option 'h' registered more than once! LLVM ERROR: inconsistency in registered CommandLine options Aborted ```

Changed status from 'Resolved' to: 'Needs User Info'

Changed status from 'Resolved' to: 'Needs User Info'

I suspect that's a conflict between LLVM in HIP and LLVM in Mesa. We've had this issue before where there's a conflict between the LLVM in Blender with the one in Mesa, and for that we hide the LLVM symbols on the Blender side.

The way HIP is built on Debian might not hide those symbols. If that's the case it would be an issue for any application using HIP + OpenGL and would need to be solved by either Debian or ROCm developers. I don't see anything we can immediately do on the Blender side.

I suspect that's a conflict between LLVM in HIP and LLVM in Mesa. We've had this issue before where there's a conflict between the LLVM in Blender with the one in Mesa, and for that we hide the LLVM symbols on the Blender side. The way HIP is built on Debian might not hide those symbols. If that's the case it would be an issue for any application using HIP + OpenGL and would need to be solved by either Debian or ROCm developers. I don't see anything we can immediately do on the Blender side.

Thanks again. I'll report this on ROCm github or will try to forward this to Debian AI team.

Thanks again. I'll report this on ROCm github or will try to forward this to Debian AI team.

I reported this problem in Debian AI mailing list and got very quick response with some findings that might be helpful with diagnosing the problem: https://lists.debian.org/debian-ai/2022/11/msg00008.html

I1105 00:57:15.862457 879154 device.cpp:32] HIPEW initialization succeeded
I1105 00:57:15.862509 879154 device.cpp:34] Found precompiled kernels
[New Thread 0x7fff325ff6c0 (LWP 879325)]

Thread 1 "blender" received signal SIGSEGV, Segmentation fault.
0x00007ffff7c15d95 in ?? () from /lib/x86_64-linux-gnu/libjemalloc.so.2
(gdb) bt
- 0  0x00007ffff7c15d95 in  () at /lib/x86_64-linux-gnu/libjemalloc.so.2
- 1  0x00007fff32959154 in  () at /lib/x86_64-linux-gnu/libamdhip64.so
- 2  0x00007fff32960fa8 in  () at /lib/x86_64-linux-gnu/libamdhip64.so
- 3  0x00007fff3290f19e in  () at /lib/x86_64-linux-gnu/libamdhip64.so
- 4  0x00007fff32952dfe in  () at /lib/x86_64-linux-gnu/libamdhip64.so
- 5  0x00007fff326c676c in  () at /lib/x86_64-linux-gnu/libamdhip64.so
- 6  0x00007fff326c75ad in hipInit () at /lib/x86_64-linux-gnu/libamdhip64.so
- 7  0x0000555557e38824 in ccl::device_hip_safe_init () at ./intern/cycles/device/hip/device.cpp:96
- 8  ccl::device_hip_info(ccl::vector<ccl::DeviceInfo, ccl::GuardedAllocator<ccl::DeviceInfo> >&) (devices=...) at ./intern/cycles/device/hip/device.cpp:104
- 9  0x0000555557e20b7a in ccl::Device::available_devices(unsigned int) (mask=34) at ./intern/cycles/device/device.cpp:228
- 10 0x0000555557bbbc3d in ccl::available_devices_func(PyObject*, PyObject*) (args=<optimized out>) at ./intern/cycles/blender/python.cpp:416
- 11 0x00007fffeff28413 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
- 12 0x00007fffefedebce in _PyObject_MakeTpCall () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
- 13 0x00007fffefe79cb4 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
- 14 0x00007fffeffc70c6 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
- 15 0x00007fffefee31b8 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
- 16 0x00007fffefe79c63 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
- 17 0x00007fffeffc70c6 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
- 18 0x00007fffefee31b8 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
- 19 0x00007fffefe79c63 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
- 20 0x00007fffeffc70c6 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
- 21 0x00007fffefee31b8 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
- 22 0x00007fffefe79c63 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
- 23 0x00007fffeffc70c6 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
- 24 0x0000555556ac015f in bpy_class_call (C=0x7fffd967e2b8, ptr=<optimized out>, func=0x55555ac15da0 <rna_Panel_draw_func>, parms=0x7fffffffdca0) at ./source/blender/python/intern/bpy_rna.c:8690
- 25 0x0000555556a5da5c in panel_draw (C=<optimized out>, panel=0x7fff439304b8) at ./source/blender/makesrna/intern/rna_ui.c:129
- 26 0x0000555556adafab in ed_panel_draw (C=C@entry=0x7fffd967e2b8, region=region@entry=0x7fff4bc55038, lb=lb@entry=0x7fff4bc55130, pt=pt@entry=0x7fff4b8ca938, panel=0x7fff439304b8, panel@entry=0x0, w=484, em=20, unique_panel_str=0x0, search_filter=0x0) at ./source/blender/editors/screen/area.c:2791
- 27 0x0000555556adca43 in ED_region_panels_layout_ex (C=C@entry=0x7fffd967e2b8, region=region@entry=0x7fff4bc55038, paneltypes=<optimized out>, contexts=contexts@entry=0x7fffffffdf60, category_override=category_override@entry=0x0) at ./source/blender/editors/screen/area.c:2989
- 28 0x00005555584a3be5 in userpref_main_region_layout (C=0x7fffd967e2b8, region=0x7fff4bc55038) at ./source/blender/editors/space_userpref/space_userpref.c:128
- 29 0x0000555556adbb9e in ED_region_do_layout (C=C@entry=0x7fffd967e2b8, region=region@entry=0x7fff4bc55038) at ./source/blender/editors/screen/area.c:511
- 30 0x00005555565543f5 in wm_draw_window_offscreen (stereo=false, win=0x7fff43dd7a78, C=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm_draw.c:889
- 31 wm_draw_window (win=0x7fff43dd7a78, C=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm_draw.c:1111
- 32 wm_draw_update (C=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm_draw.c:1338
- 33 0x0000555556550f40 in WM_main (C=C@entry=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm.c:640
#34 0x0000555555efa1ca in main (argc=2, argv=0x7fffffffe248) at ./source/creator/creator.c:547
I reported this problem in Debian AI mailing list and got very quick response with some findings that might be helpful with diagnosing the problem: https://lists.debian.org/debian-ai/2022/11/msg00008.html ``` I1105 00:57:15.862457 879154 device.cpp:32] HIPEW initialization succeeded I1105 00:57:15.862509 879154 device.cpp:34] Found precompiled kernels [New Thread 0x7fff325ff6c0 (LWP 879325)] Thread 1 "blender" received signal SIGSEGV, Segmentation fault. 0x00007ffff7c15d95 in ?? () from /lib/x86_64-linux-gnu/libjemalloc.so.2 (gdb) bt - 0 0x00007ffff7c15d95 in () at /lib/x86_64-linux-gnu/libjemalloc.so.2 - 1 0x00007fff32959154 in () at /lib/x86_64-linux-gnu/libamdhip64.so - 2 0x00007fff32960fa8 in () at /lib/x86_64-linux-gnu/libamdhip64.so - 3 0x00007fff3290f19e in () at /lib/x86_64-linux-gnu/libamdhip64.so - 4 0x00007fff32952dfe in () at /lib/x86_64-linux-gnu/libamdhip64.so - 5 0x00007fff326c676c in () at /lib/x86_64-linux-gnu/libamdhip64.so - 6 0x00007fff326c75ad in hipInit () at /lib/x86_64-linux-gnu/libamdhip64.so - 7 0x0000555557e38824 in ccl::device_hip_safe_init () at ./intern/cycles/device/hip/device.cpp:96 - 8 ccl::device_hip_info(ccl::vector<ccl::DeviceInfo, ccl::GuardedAllocator<ccl::DeviceInfo> >&) (devices=...) at ./intern/cycles/device/hip/device.cpp:104 - 9 0x0000555557e20b7a in ccl::Device::available_devices(unsigned int) (mask=34) at ./intern/cycles/device/device.cpp:228 - 10 0x0000555557bbbc3d in ccl::available_devices_func(PyObject*, PyObject*) (args=<optimized out>) at ./intern/cycles/blender/python.cpp:416 - 11 0x00007fffeff28413 in () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 - 12 0x00007fffefedebce in _PyObject_MakeTpCall () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 - 13 0x00007fffefe79cb4 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 - 14 0x00007fffeffc70c6 in () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 - 15 0x00007fffefee31b8 in () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 - 16 0x00007fffefe79c63 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 - 17 0x00007fffeffc70c6 in () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 - 18 0x00007fffefee31b8 in () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 - 19 0x00007fffefe79c63 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 - 20 0x00007fffeffc70c6 in () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 - 21 0x00007fffefee31b8 in () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 - 22 0x00007fffefe79c63 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 - 23 0x00007fffeffc70c6 in () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 - 24 0x0000555556ac015f in bpy_class_call (C=0x7fffd967e2b8, ptr=<optimized out>, func=0x55555ac15da0 <rna_Panel_draw_func>, parms=0x7fffffffdca0) at ./source/blender/python/intern/bpy_rna.c:8690 - 25 0x0000555556a5da5c in panel_draw (C=<optimized out>, panel=0x7fff439304b8) at ./source/blender/makesrna/intern/rna_ui.c:129 - 26 0x0000555556adafab in ed_panel_draw (C=C@entry=0x7fffd967e2b8, region=region@entry=0x7fff4bc55038, lb=lb@entry=0x7fff4bc55130, pt=pt@entry=0x7fff4b8ca938, panel=0x7fff439304b8, panel@entry=0x0, w=484, em=20, unique_panel_str=0x0, search_filter=0x0) at ./source/blender/editors/screen/area.c:2791 - 27 0x0000555556adca43 in ED_region_panels_layout_ex (C=C@entry=0x7fffd967e2b8, region=region@entry=0x7fff4bc55038, paneltypes=<optimized out>, contexts=contexts@entry=0x7fffffffdf60, category_override=category_override@entry=0x0) at ./source/blender/editors/screen/area.c:2989 - 28 0x00005555584a3be5 in userpref_main_region_layout (C=0x7fffd967e2b8, region=0x7fff4bc55038) at ./source/blender/editors/space_userpref/space_userpref.c:128 - 29 0x0000555556adbb9e in ED_region_do_layout (C=C@entry=0x7fffd967e2b8, region=region@entry=0x7fff4bc55038) at ./source/blender/editors/screen/area.c:511 - 30 0x00005555565543f5 in wm_draw_window_offscreen (stereo=false, win=0x7fff43dd7a78, C=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm_draw.c:889 - 31 wm_draw_window (win=0x7fff43dd7a78, C=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm_draw.c:1111 - 32 wm_draw_update (C=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm_draw.c:1338 - 33 0x0000555556550f40 in WM_main (C=C@entry=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm.c:640 #34 0x0000555555efa1ca in main (argc=2, argv=0x7fffffffe248) at ./source/creator/creator.c:547 ```

That looks like a different issue with a custom Blender build, something related to jemalloc and HIP.

It's not clear to me what they did to verify that the LLVM symbols are properly hidden. I think Option 'h' registered more than once! clearly points to a conflict between multiple LLVM versions.

Note that many LLVM symbols do not have a prefix to easily identify them, see for example this symbol blacklist we used to use for Blender (we switched to a whitelist since):
https://developer.blender.org/diffusion/B/browse/master/source/creator/blender.map;v3.2.2$76

That looks like a different issue with a custom Blender build, something related to jemalloc and HIP. It's not clear to me what they did to verify that the LLVM symbols are properly hidden. I think `Option 'h' registered more than once!` clearly points to a conflict between multiple LLVM versions. Note that many LLVM symbols do not have a prefix to easily identify them, see for example this symbol blacklist we used to use for Blender (we switched to a whitelist since): https://developer.blender.org/diffusion/B/browse/master/source/creator/blender.map;v3.2.2$76

Added subscriber: @cgbloor

Added subscriber: @cgbloor

I rebuilt the ROCm stack with debug symbols and got a better backtrace. Blender dies during the static initialization of comgr while creating the -h option for comgr-objdump - x. It's not very clear to me why any command-line options are being registered once (let alone twice).

I have so many questions. What's the other component that has registered -h? What protects against double-registration when using AMD's binary packages? How might double-registration behaviour relate to a conflict between LLVM versions?

- 0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
- 1  0x00007ffff22a9d2f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
- 2  0x00007ffff225aef2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
- 3  0x00007ffff2245472 in __GI_abort () at ./stdlib/abort.c:79
- 4  0x00007fffc308bc2b in llvm::report_fatal_error(llvm::Twine const&, bool) () from /lib/x86_64-linux-gnu/libLLVM-15.so.1
- 5  0x00007fffc308ba76 in llvm::report_fatal_error(char const*, bool) () from /lib/x86_64-linux-gnu/libLLVM-15.so.1
- 6  0x00007fffc307334e in ?? () from /lib/x86_64-linux-gnu/libLLVM-15.so.1
- 7  0x00007fffc3064cbb in llvm::cl::Option::addArgument() () from /lib/x86_64-linux-gnu/libLLVM-15.so.1
- 8  0x00007ffe993fb76e in llvm::cl::alias::done (this=<optimized out>) at /usr/lib/llvm-15/include/llvm/Support/CommandLine.h:1875
- 9  0x00007ffe993fd831 in llvm::cl::alias::alias<char [2], llvm::cl::desc, llvm::cl::aliasopt> (this=0x7ffe9ca1fa60 <SectionHeadersShorter>)
    at /usr/lib/llvm-15/include/llvm/Support/CommandLine.h:1893
- 10 0x00007ffe993ba294 in __static_initialization_and_destruction_0 (__priority=65535, __initialize_p=1) at ./lib/comgr/src/comgr-objdump.cpp:180
- 11 0x00007ffff7fcfabe in call_init (env=0x7fffffffe098, argv=0x7fffffffe088, argc=1, l=<optimized out>) at ./elf/dl-init.c:70
- 12 call_init (l=<optimized out>, argc=1, argv=0x7fffffffe088, env=0x7fffffffe098) at ./elf/dl-init.c:26
- 13 0x00007ffff7fcfba4 in _dl_init (main_map=0x7fffa5812f00, argc=1, argv=0x7fffffffe088, env=0x7fffffffe098) at ./elf/dl-init.c:117
- 14 0x00007ffff236de94 in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:182
- 15 0x00007ffff7fd630e in dl_open_worker (a=a@entry=0x7fffffffc450) at ./elf/dl-open.c:808
- 16 0x00007ffff236de3a in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:208
- 17 0x00007ffff7fd66a8 in _dl_open (file=0x7fff9d9651d1 "libamd_comgr.so.2", mode=<optimized out>, caller_dlopen=0x7fff9d8d7427 <amd::Os::loadLibrary(char const*)+167>, 
    nsid=<optimized out>, argc=1, argv=0x7fffffffe088, env=0x7fffffffe098) at ./elf/dl-open.c:884
- 18 0x00007ffff22a42d8 in dlopen_doit (a=a@entry=0x7fffffffc6c0) at ./dlfcn/dlopen.c:56
- 19 0x00007ffff236de3a in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffc620, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:208
#20 0x00007ffff236deef in __GI__dl_catch_error (objname=0x7fffffffc678, errstring=0x7fffffffc680, mallocedp=0x7fffffffc677, operate=<optimized out>, args=<optimized out>)
    at ./elf/dl-error-skeleton.c:227
- 21 0x00007ffff22a3dc7 in _dlerror_run (operate=operate@entry=0x7ffff22a4280 <dlopen_doit>, args=args@entry=0x7fffffffc6c0) at ./dlfcn/dlerror.c:138
- 22 0x00007ffff22a4389 in dlopen_implementation (dl_caller=<optimized out>, mode=<optimized out>, file=<optimized out>) at ./dlfcn/dlopen.c:71
- 23 ___dlopen (file=<optimized out>, mode=<optimized out>) at ./dlfcn/dlopen.c:81
- 24 0x00007fff9d8d7427 in amd::Os::loadLibrary (libraryname=libraryname@entry=0x7fff9d9651d1 "libamd_comgr.so.2") at ./clr/os/os.cpp:76
- 25 0x00007fff9d8aeea1 in amd::Comgr::LoadLib () at ./clr/device/comgrctx.cpp:37
- 26 0x00007ffff22ace37 in __pthread_once_slow (once_control=0x7fff9e347998 <amd::Comgr::initialized>, init_routine=0x7ffff20d3200 <__once_proxy>) at ./nptl/pthread_once.c:116
- 27 0x00007fff9d8b2309 in __gthread_once (__func=<optimized out>, __once=<optimized out>) at /usr/include/x86_64-linux-gnu/c++/12/bits/gthr-default.h:700
- 28 std::call_once<bool (&)()> (__f=<optimized out>, __once=...) at /usr/include/c++/12/mutex:859
- 29 amd::Device::ValidateComgr (this=this@entry=0x7fffba297000) at ./clr/device/device.cpp:493
- 30 0x00007fff9d9021f5 in roc::Device::create (this=0x7fffba297000) at ./clr/device/rocm/rocdevice.cpp:628
- 31 0x00007fff9d902d69 in roc::Device::init () at ./clr/device/rocm/rocdevice.cpp:479
- 32 0x00007fff9d8b1e00 in amd::Device::init () at ./clr/device/device.cpp:414
- 33 0x00007fff9d8f3bde in amd::Runtime::init () at ./clr/platform/runtime.cpp:75
- 34 0x00007fff9d6ada2c in hip::init () at ./src/hip_context.cpp:47
#35 0x00007fff9d6ae545 in hipInit (flags=0) at ./src/hip_context.cpp:147
I rebuilt the ROCm stack with debug symbols and got a better backtrace. Blender dies during the static initialization of comgr while creating the `-h` option for comgr-objdump - [x]. It's not very clear to me why any command-line options are being registered once (let alone twice). I have so many questions. What's the other component that has registered `-h`? What protects against double-registration when using AMD's binary packages? How might double-registration behaviour relate to a conflict between LLVM versions? ``` - 0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44 - 1 0x00007ffff22a9d2f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 - 2 0x00007ffff225aef2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 - 3 0x00007ffff2245472 in __GI_abort () at ./stdlib/abort.c:79 - 4 0x00007fffc308bc2b in llvm::report_fatal_error(llvm::Twine const&, bool) () from /lib/x86_64-linux-gnu/libLLVM-15.so.1 - 5 0x00007fffc308ba76 in llvm::report_fatal_error(char const*, bool) () from /lib/x86_64-linux-gnu/libLLVM-15.so.1 - 6 0x00007fffc307334e in ?? () from /lib/x86_64-linux-gnu/libLLVM-15.so.1 - 7 0x00007fffc3064cbb in llvm::cl::Option::addArgument() () from /lib/x86_64-linux-gnu/libLLVM-15.so.1 - 8 0x00007ffe993fb76e in llvm::cl::alias::done (this=<optimized out>) at /usr/lib/llvm-15/include/llvm/Support/CommandLine.h:1875 - 9 0x00007ffe993fd831 in llvm::cl::alias::alias<char [2], llvm::cl::desc, llvm::cl::aliasopt> (this=0x7ffe9ca1fa60 <SectionHeadersShorter>) at /usr/lib/llvm-15/include/llvm/Support/CommandLine.h:1893 - 10 0x00007ffe993ba294 in __static_initialization_and_destruction_0 (__priority=65535, __initialize_p=1) at ./lib/comgr/src/comgr-objdump.cpp:180 - 11 0x00007ffff7fcfabe in call_init (env=0x7fffffffe098, argv=0x7fffffffe088, argc=1, l=<optimized out>) at ./elf/dl-init.c:70 - 12 call_init (l=<optimized out>, argc=1, argv=0x7fffffffe088, env=0x7fffffffe098) at ./elf/dl-init.c:26 - 13 0x00007ffff7fcfba4 in _dl_init (main_map=0x7fffa5812f00, argc=1, argv=0x7fffffffe088, env=0x7fffffffe098) at ./elf/dl-init.c:117 - 14 0x00007ffff236de94 in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:182 - 15 0x00007ffff7fd630e in dl_open_worker (a=a@entry=0x7fffffffc450) at ./elf/dl-open.c:808 - 16 0x00007ffff236de3a in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:208 - 17 0x00007ffff7fd66a8 in _dl_open (file=0x7fff9d9651d1 "libamd_comgr.so.2", mode=<optimized out>, caller_dlopen=0x7fff9d8d7427 <amd::Os::loadLibrary(char const*)+167>, nsid=<optimized out>, argc=1, argv=0x7fffffffe088, env=0x7fffffffe098) at ./elf/dl-open.c:884 - 18 0x00007ffff22a42d8 in dlopen_doit (a=a@entry=0x7fffffffc6c0) at ./dlfcn/dlopen.c:56 - 19 0x00007ffff236de3a in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffc620, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:208 #20 0x00007ffff236deef in __GI__dl_catch_error (objname=0x7fffffffc678, errstring=0x7fffffffc680, mallocedp=0x7fffffffc677, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:227 - 21 0x00007ffff22a3dc7 in _dlerror_run (operate=operate@entry=0x7ffff22a4280 <dlopen_doit>, args=args@entry=0x7fffffffc6c0) at ./dlfcn/dlerror.c:138 - 22 0x00007ffff22a4389 in dlopen_implementation (dl_caller=<optimized out>, mode=<optimized out>, file=<optimized out>) at ./dlfcn/dlopen.c:71 - 23 ___dlopen (file=<optimized out>, mode=<optimized out>) at ./dlfcn/dlopen.c:81 - 24 0x00007fff9d8d7427 in amd::Os::loadLibrary (libraryname=libraryname@entry=0x7fff9d9651d1 "libamd_comgr.so.2") at ./clr/os/os.cpp:76 - 25 0x00007fff9d8aeea1 in amd::Comgr::LoadLib () at ./clr/device/comgrctx.cpp:37 - 26 0x00007ffff22ace37 in __pthread_once_slow (once_control=0x7fff9e347998 <amd::Comgr::initialized>, init_routine=0x7ffff20d3200 <__once_proxy>) at ./nptl/pthread_once.c:116 - 27 0x00007fff9d8b2309 in __gthread_once (__func=<optimized out>, __once=<optimized out>) at /usr/include/x86_64-linux-gnu/c++/12/bits/gthr-default.h:700 - 28 std::call_once<bool (&)()> (__f=<optimized out>, __once=...) at /usr/include/c++/12/mutex:859 - 29 amd::Device::ValidateComgr (this=this@entry=0x7fffba297000) at ./clr/device/device.cpp:493 - 30 0x00007fff9d9021f5 in roc::Device::create (this=0x7fffba297000) at ./clr/device/rocm/rocdevice.cpp:628 - 31 0x00007fff9d902d69 in roc::Device::init () at ./clr/device/rocm/rocdevice.cpp:479 - 32 0x00007fff9d8b1e00 in amd::Device::init () at ./clr/device/device.cpp:414 - 33 0x00007fff9d8f3bde in amd::Runtime::init () at ./clr/platform/runtime.cpp:75 - 34 0x00007fff9d6ada2c in hip::init () at ./src/hip_context.cpp:47 #35 0x00007fff9d6ae545 in hipInit (flags=0) at ./src/hip_context.cpp:147 ``` - [x]: https://salsa.debian.org/rocm-team/rocm-compilersupport/-/blob/e318bc6a80d207ba109128a62701ce1b0fe3104b/lib/comgr/src/comgr-objdump.cpp#L178-180

Registering those command line options is part of the static initialization of LLVM. If there are multiple LLVM libraries in memory with visible symbols, rather than each LLVM library initializing their own variables, the variables of one of the instances will be initialized twice. And then you get that error.

I don't know how the AMD binary package is built exactly, but presumably the LLVM symbols are hidden or there is some other mechanism to keep the LLVM symbols separate from the mesa LLVM symbols.

Registering those command line options is part of the static initialization of LLVM. If there are multiple LLVM libraries in memory with visible symbols, rather than each LLVM library initializing their own variables, the variables of one of the instances will be initialized twice. And then you get that error. I don't know how the AMD binary package is built exactly, but presumably the LLVM symbols are hidden or there is some other mechanism to keep the LLVM symbols separate from the mesa LLVM symbols.

Added subscriber: @Lendo

Added subscriber: @Lendo
Member

Changed status from 'Needs User Info' to: 'Resolved'

Changed status from 'Needs User Info' to: 'Resolved'

Changed status from 'Resolved' to: 'Confirmed'

Changed status from 'Resolved' to: 'Confirmed'
Brecht Van Lommel removed their assignment 2022-11-28 14:25:08 +01:00

Got accidentally closed by backporting to 3.3.

Still not something we can fix on the Blender side i believe but in Linux distribution packing, so will mark as known issue.

Got accidentally closed by backporting to 3.3. Still not something we can fix on the Blender side i believe but in Linux distribution packing, so will mark as known issue.

Added subscriber: @rherilier

Added subscriber: @rherilier

why using hypothetical paths to access to libamdhip64.so and hipcc?

here is a minimal CMakeLists.txt to find them at configure time :

project(detect_amdhip64_location)

cmake_minimum_required(VERSION 2.8.12)

find_package(HIP)

get_target_property(AMDHIP64_LOCATION hip::amdhip64 LOCATION)

message("HIP_HIPCC_EXECUTABLE =  ${HIP_HIPCC_EXECUTABLE}")
message("AMDHIP64_LOCATION = ${AMDHIP64_LOCATION}")

As the CMake macro blender_add_lib(...) does not seem to allow extra compilation flags like "-DXXX=YYY", so using a config.h.in should do the trick.

why using hypothetical paths to access to libamdhip64.so and hipcc? here is a minimal CMakeLists.txt to find them at configure time : ``` project(detect_amdhip64_location) cmake_minimum_required(VERSION 2.8.12) find_package(HIP) get_target_property(AMDHIP64_LOCATION hip::amdhip64 LOCATION) message("HIP_HIPCC_EXECUTABLE = ${HIP_HIPCC_EXECUTABLE}") message("AMDHIP64_LOCATION = ${AMDHIP64_LOCATION}") ``` As the CMake macro `blender_add_lib(...)` does not seem to allow extra compilation flags like "-DXXX=YYY", so using a config.h.in should do the trick.

We build Blender binaries to work on multiple Linux distributions.

If some Linux distribution wants to build and package Blender with a specific path to the library I guess we could support that, but not sure why they would not have it in a system library path.

This has worked fine for CUDA so far, and I imagine distributions want to package HIP the same way.

We build Blender binaries to work on multiple Linux distributions. If some Linux distribution wants to build and package Blender with a specific path to the library I guess we could support that, but not sure why they would not have it in a system library path. This has worked fine for CUDA so far, and I imagine distributions want to package HIP the same way.

You have a point.

brecht's solution should solve the problem: as a packaged hipcc depends on libamdhip64's development package (it's the case under Debian), there is no need to add extra names to search for like "libamdhip64.so.5".

You have a point. brecht's solution should solve the problem: as a packaged `hipcc` depends on libamdhip64's development package (it's the case under Debian), there is no need to add extra names to search for like "libamdhip64.so.5".
Brecht Van Lommel changed title from Blender does not detect Radeon VII GPUs with 5.2.3 ROCm drivers installed to Cycles HIP issues on Debian 2022-12-01 21:08:25 +01:00

I've renamed the task to reflect that this is actually no longer about detection, the issue remaining now is the Mesa LLVM vs. HIP LLVM conflict.

I've renamed the task to reflect that this is actually no longer about detection, the issue remaining now is the Mesa LLVM vs. HIP LLVM conflict.

Added subscriber: @Grant520

Added subscriber: @Grant520

LLVM Command Line issue was temporary fixed in recent ROCm update in Debian.
So far Blender can detect both GPUs, and rendering also work. As far as I'm concerned this task can be closed.

The problem still waits for permanent fix upstream:
https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/52

LLVM Command Line issue was temporary fixed in recent ROCm update in Debian. So far Blender can detect both GPUs, and rendering also work. As far as I'm concerned this task can be closed. The problem still waits for permanent fix upstream: https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/52

Did anyone else test this yet on Debian? I'm curious what library versions and hardware was used. I'm still seeing this in Blender 3.4.1 when navigating to the HIP config dialog.

> blender -d

Switching to fully guarded memory allocator.
Blender 3.4.1
Build: 2022-12-20 00:46:45 Linux release
argv[0] = blender
argv[1] = -d
Writing userprefs: '/home/grant/.config/blender/3.4/config/userpref.blend' ok
Info: Preferences saved

Warning: Agent creation failed.
The GPU node has an unrecognized id.

Writing: /tmp/blender.crash.txt
Segmentation fault

Here is the crash log:

# Blender 3.4.1, Commit date: 2022-12-19 17:00, Hash 55485cb379f7

# backtrace

# Python backtrace
  File "/home/grant/bin/blender-3.4.1-linux-x64/3.4/scripts/addons/cycles/properties.py", line 1545 in get_devices_for_type
  File "/home/grant/bin/blender-3.4.1-linux-x64/3.4/scripts/addons/cycles/properties.py", line 1670 in draw_impl
  File "/home/grant/bin/blender-3.4.1-linux-x64/3.4/scripts/startup/bl_ui/space_userpref.py", line 603 in draw_centered
  File "/home/grant/bin/blender-3.4.1-linux-x64/3.4/scripts/startup/bl_ui/space_userpref.py", line 178 in draw

I believe this might be due to the ROCm libraries that come with Debian testing still being too old for newer AMD chips:

libhsa-runtime64-1/testing,now 5.2.3-1 amd64 [installed,automatic]
  HSA Runtime API and runtime for ROCm

I have an AMD Ryzen 9 7950X which has onboard video that has a device ID that is not recognized by 5.2.x versions of ROCm. The "unrecognized id" part seems to crash Blender, even though the following devices are valid.

> sudo rocminfo

ROCk module is loaded
Warning: Agent creation failed.
The GPU node has an unrecognized id.

=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******Agent 1*******                  
  Name:                    AMD Ryzen 9 7950X 16-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 9 7950X 16-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   5881                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    31991660(0x1e8276c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    31991660(0x1e8276c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    31991660(0x1e8276c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******Agent 2*******                  
  Name:                    gfx1032                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 6650 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      2048(0x800) KB                     
    L3:                      32768(0x8000) KB                   
  Chip ID:                 29679(0x73ef)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2765                               
  BDFID:                   768                                
  Internal Node ID:        1                                  
  Compute Unit:            32                                 
  SIMDs per CU:            2                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1032         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
***Done***
Did anyone else test this yet on Debian? I'm curious what library versions and hardware was used. I'm still seeing this in Blender 3.4.1 when navigating to the HIP config dialog. ``` > blender -d Switching to fully guarded memory allocator. Blender 3.4.1 Build: 2022-12-20 00:46:45 Linux release argv[0] = blender argv[1] = -d Writing userprefs: '/home/grant/.config/blender/3.4/config/userpref.blend' ok Info: Preferences saved Warning: Agent creation failed. The GPU node has an unrecognized id. Writing: /tmp/blender.crash.txt Segmentation fault ``` Here is the crash log: ``` # Blender 3.4.1, Commit date: 2022-12-19 17:00, Hash 55485cb379f7 # backtrace # Python backtrace File "/home/grant/bin/blender-3.4.1-linux-x64/3.4/scripts/addons/cycles/properties.py", line 1545 in get_devices_for_type File "/home/grant/bin/blender-3.4.1-linux-x64/3.4/scripts/addons/cycles/properties.py", line 1670 in draw_impl File "/home/grant/bin/blender-3.4.1-linux-x64/3.4/scripts/startup/bl_ui/space_userpref.py", line 603 in draw_centered File "/home/grant/bin/blender-3.4.1-linux-x64/3.4/scripts/startup/bl_ui/space_userpref.py", line 178 in draw ``` I believe this might be due to the ROCm libraries that come with Debian testing still being too old for newer AMD chips: ``` libhsa-runtime64-1/testing,now 5.2.3-1 amd64 [installed,automatic] HSA Runtime API and runtime for ROCm ``` I have an AMD Ryzen 9 7950X which has onboard video that has a device ID that is not recognized by 5.2.x versions of ROCm. The "unrecognized id" part seems to crash Blender, even though the following devices are valid. ``` > sudo rocminfo ROCk module is loaded Warning: Agent creation failed. The GPU node has an unrecognized id. ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE ========== HSA Agents ========== *******Agent 1******* Name: AMD Ryzen 9 7950X 16-Core Processor Uuid: CPU-XX Marketing Name: AMD Ryzen 9 7950X 16-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 5881 BDFID: 0 Internal Node ID: 0 Compute Unit: 32 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 31991660(0x1e8276c) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 31991660(0x1e8276c) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 31991660(0x1e8276c) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: *******Agent 2******* Name: gfx1032 Uuid: GPU-XX Marketing Name: AMD Radeon RX 6650 XT Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 2048(0x800) KB L3: 32768(0x8000) KB Chip ID: 29679(0x73ef) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2765 BDFID: 768 Internal Node ID: 1 Compute Unit: 32 SIMDs per CU: 2 Shader Engines: 4 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 8372224(0x7fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1032 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 ***Done*** ```
Philipp Oeser removed the
Interest
Render & Cycles
label 2023-02-09 14:04:04 +01:00
LLVM Command Line issue was fixed in ROCm: https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/52#issuecomment-1385754897

Since this is fixed in ROCm, I don't think there is anything we can do further on the Blender side besides hope all distros have updated or will do so soon, so closing.

Since this is fixed in ROCm, I don't think there is anything we can do further on the Blender side besides hope all distros have updated or will do so soon, so closing.
Blender Bot added
Status
Archived
and removed
Status
Confirmed
labels 2023-05-02 18:13:08 +02:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Code Documentation
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
10 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#102018
No description provided.