Blender hangs forever when starting with particular userpref.blend config file (SDL and Pulse audio issue?) #126661

Closed
opened 2024-08-22 19:36:40 +02:00 by chconnor · 24 comments

Operating system: Linux-6.8.0-41-generic-x86_64-with-glibc2.39 64 Bits, X11 UI
Graphics card: NVIDIA GeForce GTX 1060 6GB/PCIe/SSE2 NVIDIA Corporation 4.6.0 NVIDIA 535.183.01
Broken: version: 4.2.1 LTS, branch: blender-v4.2-release, commit date: 2024-08-19 11:21, hash: 396f546c9d82
(as well as earlier versions)

Attached userpref.blend file causes blender to hang indefinitely on startup. This happens for 3.x versions of blender as well.

I don't know how/if my userpref.blend got "corrupted" (if that is accurate) but I can't start blender with it in place. (Regardless of the state/quality of the userpref.blend file, it seems like a bug that blender would hang on startup with a bad userpref.blend, so I file this bug.)

To reproduce:

  • fresh install of blender 4.2.1
  • start blender, do not import old settings
  • quit blender
  • move "corrupt" userpref.blend file to ~/.config/blender/4.2/config/userpref.blend
  • start blender:
$ ./blender-4.2.1-linux-x64/blender -d
Switching to fully guarded memory allocator.
Blender 4.2.1 LTS
Build: 2024-08-19 23:32:23 Linux Release
argv[0] = ./blender-4.2.1-linux-x64/blender
argv[1] = -d

...blender then hangs indefinitely*. No CPU usage, no windows open, etc.

*I have waited several minutes to see if it starts, but not longer than that.

Operating system: Linux-6.8.0-41-generic-x86_64-with-glibc2.39 64 Bits, X11 UI Graphics card: NVIDIA GeForce GTX 1060 6GB/PCIe/SSE2 NVIDIA Corporation 4.6.0 NVIDIA 535.183.01 Broken: version: 4.2.1 LTS, branch: blender-v4.2-release, commit date: 2024-08-19 11:21, hash: `396f546c9d82` (as well as earlier versions) Attached userpref.blend file causes blender to hang indefinitely on startup. This happens for 3.x versions of blender as well. I don't know how/if my userpref.blend got "corrupted" (if that is accurate) but I can't start blender with it in place. (Regardless of the state/quality of the userpref.blend file, it seems like a bug that blender would hang on startup with a bad userpref.blend, so I file this bug.) To reproduce: - fresh install of blender 4.2.1 - start blender, do not import old settings - quit blender - move "corrupt" userpref.blend file to ~/.config/blender/4.2/config/userpref.blend - start blender: ``` $ ./blender-4.2.1-linux-x64/blender -d Switching to fully guarded memory allocator. Blender 4.2.1 LTS Build: 2024-08-19 23:32:23 Linux Release argv[0] = ./blender-4.2.1-linux-x64/blender argv[1] = -d ``` ...blender then hangs indefinitely*. No CPU usage, no windows open, etc. *I have waited several minutes to see if it starts, but not longer than that.
chconnor added the
Severity
Normal
Status
Needs Triage
Type
Bug
labels 2024-08-22 19:36:41 +02:00
Member

Hi, thanks for the report. Attached userpref.blend works fine with 4.2 configs.
I think the freeze on your end is likely due to the add-ons (blenderkit, machin2tools.etc). Do you have them in /scripts folder?

Hi, thanks for the report. Attached userpref.blend works fine with 4.2 configs. I think the freeze on your end is likely due to the add-ons (blenderkit, machin2tools.etc). Do you have them in `/scripts` folder?
Pratik Borhade added
Status
Needs Information from User
and removed
Status
Needs Triage
labels 2024-08-23 06:42:48 +02:00
Author

Thanks -- the hang happens as described in the step-by-step above -- if I have a fresh config directory (e.g. no 4.2 directory exists, I start 4.2, I do not import anything, then exit 4.2, there is now a 4.2 config directory) and then only copy in the userpref.blend file (no addons, etc) and try to start, the hang happens.

me@myhost:~/.config/blender/4.2 $ find .
.
./config
./config/recent-searches.txt
./config/userpref.blend
me@myhost:~/.config/blender/4.2 $

If instead I start 4.2 without a config directory, and choose to "import Blender 4.0 Preferences", it hangs immediately (presumably because it is restarting in some sense). I kill it, and in that case all the add-ons are present in the scripts directory, and if I try to run it again, the hang is the same.

Thanks -- the hang happens as described in the step-by-step above -- if I have a fresh config directory (e.g. no 4.2 directory exists, I start 4.2, I do not import anything, then exit 4.2, there is now a 4.2 config directory) and then only copy in the userpref.blend file (no addons, etc) and try to start, the hang happens. ``` me@myhost:~/.config/blender/4.2 $ find . . ./config ./config/recent-searches.txt ./config/userpref.blend me@myhost:~/.config/blender/4.2 $ ``` If instead I start 4.2 without a config directory, and choose to "import Blender 4.0 Preferences", it hangs immediately (presumably because it is restarting in some sense). I kill it, and in that case all the add-ons are present in the scripts directory, and if I try to run it again, the hang is the same.
Member

Maybe due to the asset library?: (path is /home/casey/Documents/Blender/Assets)

Maybe due to the asset library?: (path is `/home/casey/Documents/Blender/Assets`)
Author

Hmmm... no ~/Documents/Blender directory is present.

Hmmm... no ~/Documents/Blender directory is present.
Pratik Borhade added
Status
Needs Triage
and removed
Status
Needs Information from User
labels 2024-08-23 08:26:11 +02:00
Member

@lichtwerk hi, can you replicate on linux?

@lichtwerk hi, can you replicate on linux?
Member

Yes can confirm (hangs here as well).

Also in 4.1.1
4.3 seems fine though

If I use the file in 4.3, I am seeing the following

Add-on not loaded: "io_import_dxf", cause: No module named 'io_import_dxf'
Add-on not loaded: "space_view3d_pie_menus", cause: No module named 'space_view3d_pie_menus'
Add-on not loaded: "animation_nodes", cause: No module named 'animation_nodes'
Add-on not loaded: "sketchfab-plugin-1-2-1", cause: No module named 'sketchfab-plugin-1-2-1'
Add-on not loaded: "MACHIN3tools", cause: No module named 'MACHIN3tools'
Add-on not loaded: "blenderkit", cause: No module named 'blenderkit'
Add-on not loaded: "io_export_dxf", cause: No module named 'io_export_dxf'

But I have disabled them all, resaved the prefs and it is still an issue.

Might have a look when that got "fixed"

@chconnor : can you confirm that using those prefs for 4.3 from https://builder.blender.org/download/daily/ is not an issue anymore?

Yes can confirm (hangs here as well). Also in 4.1.1 4.3 seems fine though If I use the file in 4.3, I am seeing the following ``` Add-on not loaded: "io_import_dxf", cause: No module named 'io_import_dxf' Add-on not loaded: "space_view3d_pie_menus", cause: No module named 'space_view3d_pie_menus' Add-on not loaded: "animation_nodes", cause: No module named 'animation_nodes' Add-on not loaded: "sketchfab-plugin-1-2-1", cause: No module named 'sketchfab-plugin-1-2-1' Add-on not loaded: "MACHIN3tools", cause: No module named 'MACHIN3tools' Add-on not loaded: "blenderkit", cause: No module named 'blenderkit' Add-on not loaded: "io_export_dxf", cause: No module named 'io_export_dxf' ``` But I have disabled them all, resaved the prefs and it is still an issue. Might have a look when that got "fixed" @chconnor : can you confirm that using those prefs for 4.3 from https://builder.blender.org/download/daily/ is not an issue anymore?
Philipp Oeser added
Status
Confirmed
and removed
Status
Needs Triage
labels 2024-08-23 15:05:37 +02:00
Member

Hold on, it seems the buildbot build is also not working... but my local build does...

Hold on, it seems the buildbot build is also not working... but my local build does...
Member

Since I can only repro with buildbot builds, not even sure how to debug this furhter, maybe @mont29 or @ideasman42 have an idea?

Would this be Core module responsibility @mont29 ? (will set this as a placeholder module for now...)

Since I can only repro with buildbot builds, not even sure how to debug this furhter, maybe @mont29 or @ideasman42 have an idea? Would this be Core module responsibility @mont29 ? (will set this as a placeholder module for now...)
Philipp Oeser added the
Module
Core
label 2024-08-23 16:08:07 +02:00

I... have absolutely no idea what to do with this one... On linux, the blender process seems to enter some sort of deadlock, or maybe infinite wait trying to read/access some remote non-existent data? It does not reacts to signals at least, and needs to be forcefully killed.

Here is a backtrace of all threads from gdb, in the locked situation (using official 4.2.1 build), using gdb --args ./blender -t 1 to limit the amount of active threads:

Thread 38 (Thread 0x7fffbe200680 (LWP 1879645) "PulseMainloop"):
#0  0x00007fffe9b10b5f in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fffe19be955 in ?? () from /lib/x86_64-linux-gnu/libpulse.so.0
#2  0x00007fffe19b02ec in pa_mainloop_poll () from /lib/x86_64-linux-gnu/libpulse.so.0
#3  0x00007fffe19b095a in pa_mainloop_iterate () from /lib/x86_64-linux-gnu/libpulse.so.0
#4  0x00007fffe19b0a00 in pa_mainloop_run () from /lib/x86_64-linux-gnu/libpulse.so.0
#5  0x00007fffe19bea2d in ?? () from /lib/x86_64-linux-gnu/libpulse.so.0
#6  0x00007fffde309163 in ?? () from /usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-16.1.so
#7  0x00007fffe9aa36c2 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x00007fffe9b1e128 in ?? () from /lib/x86_64-linux-gnu/libc.so.6

Thread 37 (Thread 0x7fffbec00680 (LWP 1879644) "blender"):
#0  0x00007fffe9aa01be in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fffe9aab1a0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff783cbe2 in IlmThread_3_2::Semaphore::wait() () from /home/guest/blender/blender-4.2.1-linux-x64/lib/libIlmThread.so.31
#3  0x00007ffff783ae30 in IlmThread_3_2::(anonymous namespace)::DefaultThreadPoolProvider::threadLoop(std::shared_ptr<IlmThread_3_2::(anonymous namespace)::DefaultThreadPoolData>) () from /home/guest/blender/blender-4.2.1-linux-x64/lib/libIlmThread.so.31
#4  0x00007ffff783b468 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (IlmThread_3_2::(anonymous namespace)::DefaultThreadPoolProvider::*)(std::shared_ptr<IlmThread_3_2::(anonymous namespace)::DefaultThreadPoolData>), IlmThread_3_2::(anonymous namespace)::DefaultThreadPoolProvider*, std::shared_ptr<IlmThread_3_2::(anonymous namespace)::DefaultThreadPoolData> > > >::_M_run() () from /home/guest/blender/blender-4.2.1-linux-x64/lib/libIlmThread.so.31
#5  0x00007fffddce0f24 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fffe9aa36c2 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007fffe9b1e128 in ?? () from /lib/x86_64-linux-gnu/libc.so.6

Thread 10 (Thread 0x7fffb9a00680 (LWP 1879617) "jemalloc_bg_thd"):
#0  0x00007fffe9aa01be in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fffe9aa2920 in pthread_cond_wait () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x0000000000fec402 in ?? ()
#3  0x00007fffe9aa36c2 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007fffe9b1e128 in ?? () from /lib/x86_64-linux-gnu/libc.so.6

Thread 7 (Thread 0x7fffbbc00680 (LWP 1879614) "jemalloc_bg_thd"):
#0  0x00007fffe9aa01be in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fffe9aa2920 in pthread_cond_wait () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x0000000000fec402 in ?? ()
#3  0x00007fffe9aa36c2 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007fffe9b1e128 in ?? () from /lib/x86_64-linux-gnu/libc.so.6

Thread 5 (Thread 0x7fffbd400680 (LWP 1879612) "jemalloc_bg_thd"):
#0  0x00007fffe9aa01be in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fffe9aa2920 in pthread_cond_wait () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x0000000000fec402 in ?? ()
#3  0x00007fffe9aa36c2 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007fffe9b1e128 in ?? () from /lib/x86_64-linux-gnu/libc.so.6

Thread 2 (Thread 0x7fffda400680 (LWP 1879609) "jemalloc_bg_thd"):
#0  0x00007fffe9aa01be in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fffe9aa2920 in pthread_cond_wait () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x0000000000feccdf in ?? ()
#3  0x00007fffe9aa36c2 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007fffe9b1e128 in ?? () from /lib/x86_64-linux-gnu/libc.so.6

Thread 1 (Thread 0x7fffe487b580 (LWP 1879599) "blender"):
#0  0x00007fffe9aa01be in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fffe9aab1a0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fffda564962 in ?? () from /lib/x86_64-linux-gnu/libSDL2.so
#3  0x00007fffda523123 in ?? () from /lib/x86_64-linux-gnu/libSDL2.so
#4  0x00007fffda44534c in ?? () from /lib/x86_64-linux-gnu/libSDL2.so
#5  0x00007fffda440422 in ?? () from /lib/x86_64-linux-gnu/libSDL2.so
#6  0x00007fffda445af2 in ?? () from /lib/x86_64-linux-gnu/libSDL2.so
#7  0x0000000003a245f0 in ?? ()
#8  0x0000000003a248f9 in ?? ()
#9  0x00000000039f5cc6 in ?? ()
#10 0x0000000000a08092 in ?? ()
#11 0x0000000000f9c4e6 in ?? ()
#12 0x0000000000fa1d5f in ?? ()
#13 0x000000000072f171 in ?? ()
#14 0x00007fffe9a40c8a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#15 0x00007fffe9a40d45 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x000000000083e9ae in ?? ()
I... have absolutely no idea what to do with this one... On linux, the `blender` process seems to enter some sort of deadlock, or maybe infinite wait trying to read/access some remote non-existent data? It does not reacts to signals at least, and needs to be forcefully `kill`ed. Here is a backtrace of all threads from gdb, in the locked situation (using official 4.2.1 build), using `gdb --args ./blender -t 1` to limit the amount of active threads: ``` Thread 38 (Thread 0x7fffbe200680 (LWP 1879645) "PulseMainloop"): #0 0x00007fffe9b10b5f in poll () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fffe19be955 in ?? () from /lib/x86_64-linux-gnu/libpulse.so.0 #2 0x00007fffe19b02ec in pa_mainloop_poll () from /lib/x86_64-linux-gnu/libpulse.so.0 #3 0x00007fffe19b095a in pa_mainloop_iterate () from /lib/x86_64-linux-gnu/libpulse.so.0 #4 0x00007fffe19b0a00 in pa_mainloop_run () from /lib/x86_64-linux-gnu/libpulse.so.0 #5 0x00007fffe19bea2d in ?? () from /lib/x86_64-linux-gnu/libpulse.so.0 #6 0x00007fffde309163 in ?? () from /usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-16.1.so #7 0x00007fffe9aa36c2 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #8 0x00007fffe9b1e128 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 Thread 37 (Thread 0x7fffbec00680 (LWP 1879644) "blender"): #0 0x00007fffe9aa01be in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fffe9aab1a0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007ffff783cbe2 in IlmThread_3_2::Semaphore::wait() () from /home/guest/blender/blender-4.2.1-linux-x64/lib/libIlmThread.so.31 #3 0x00007ffff783ae30 in IlmThread_3_2::(anonymous namespace)::DefaultThreadPoolProvider::threadLoop(std::shared_ptr<IlmThread_3_2::(anonymous namespace)::DefaultThreadPoolData>) () from /home/guest/blender/blender-4.2.1-linux-x64/lib/libIlmThread.so.31 #4 0x00007ffff783b468 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (IlmThread_3_2::(anonymous namespace)::DefaultThreadPoolProvider::*)(std::shared_ptr<IlmThread_3_2::(anonymous namespace)::DefaultThreadPoolData>), IlmThread_3_2::(anonymous namespace)::DefaultThreadPoolProvider*, std::shared_ptr<IlmThread_3_2::(anonymous namespace)::DefaultThreadPoolData> > > >::_M_run() () from /home/guest/blender/blender-4.2.1-linux-x64/lib/libIlmThread.so.31 #5 0x00007fffddce0f24 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #6 0x00007fffe9aa36c2 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x00007fffe9b1e128 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 Thread 10 (Thread 0x7fffb9a00680 (LWP 1879617) "jemalloc_bg_thd"): #0 0x00007fffe9aa01be in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fffe9aa2920 in pthread_cond_wait () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x0000000000fec402 in ?? () #3 0x00007fffe9aa36c2 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x00007fffe9b1e128 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 Thread 7 (Thread 0x7fffbbc00680 (LWP 1879614) "jemalloc_bg_thd"): #0 0x00007fffe9aa01be in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fffe9aa2920 in pthread_cond_wait () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x0000000000fec402 in ?? () #3 0x00007fffe9aa36c2 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x00007fffe9b1e128 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 Thread 5 (Thread 0x7fffbd400680 (LWP 1879612) "jemalloc_bg_thd"): #0 0x00007fffe9aa01be in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fffe9aa2920 in pthread_cond_wait () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x0000000000fec402 in ?? () #3 0x00007fffe9aa36c2 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x00007fffe9b1e128 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 Thread 2 (Thread 0x7fffda400680 (LWP 1879609) "jemalloc_bg_thd"): #0 0x00007fffe9aa01be in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fffe9aa2920 in pthread_cond_wait () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x0000000000feccdf in ?? () #3 0x00007fffe9aa36c2 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x00007fffe9b1e128 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 Thread 1 (Thread 0x7fffe487b580 (LWP 1879599) "blender"): #0 0x00007fffe9aa01be in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fffe9aab1a0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fffda564962 in ?? () from /lib/x86_64-linux-gnu/libSDL2.so #3 0x00007fffda523123 in ?? () from /lib/x86_64-linux-gnu/libSDL2.so #4 0x00007fffda44534c in ?? () from /lib/x86_64-linux-gnu/libSDL2.so #5 0x00007fffda440422 in ?? () from /lib/x86_64-linux-gnu/libSDL2.so #6 0x00007fffda445af2 in ?? () from /lib/x86_64-linux-gnu/libSDL2.so #7 0x0000000003a245f0 in ?? () #8 0x0000000003a248f9 in ?? () #9 0x00000000039f5cc6 in ?? () #10 0x0000000000a08092 in ?? () #11 0x0000000000f9c4e6 in ?? () #12 0x0000000000fa1d5f in ?? () #13 0x000000000072f171 in ?? () #14 0x00007fffe9a40c8a in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #15 0x00007fffe9a40d45 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 #16 0x000000000083e9ae in ?? () ```

Actually it seems to be the SDL audio backend. Once I switch to e.g. None, it seems that there is no more issues.

From the backtrace above it looks like both SDL and Pulse are trying to run ?

Actually it seems to be the SDL audio backend. Once I switch to e.g. `None`, it seems that there is no more issues. From the backtrace above it looks like both SDL and Pulse are trying to run ?
Philipp Oeser added the
Platform
Linux
label 2024-08-26 09:02:49 +02:00
Philipp Oeser changed title from Blender hangs forever when starting with particular userpref.blend config file to Blender hangs forever when starting with particular userpref.blend config file (SDL and Pulse audio issue?) 2024-08-26 09:03:26 +02:00
Author

@lichtwerk --

@chconnor : can you confirm that using those prefs for 4.3 from https://builder.blender.org/download/daily/ is not an issue anymore?

Hmm, no: I downloaded from here: https://builder.blender.org/download/daily/ (reference 6bd515e0d2), started it, did not import configs, then copied the userpref.blend into the ~/.config/blender/4.3/config/. directory, and it hangs on startup.

Let me know if there is anything else I can do to help!

@lichtwerk -- > @chconnor : can you confirm that using those prefs for 4.3 from https://builder.blender.org/download/daily/ is not an issue anymore? Hmm, no: I downloaded from here: https://builder.blender.org/download/daily/ (reference 6bd515e0d2a2), started it, did not import configs, then copied the userpref.blend into the ~/.config/blender/4.3/config/. directory, and it hangs on startup. Let me know if there is anything else I can do to help!

This issue happens on my system, I tried to get to the bottom of this but only managed to narrow this down to a conflict between SDL & USD.

Here are some findings:

  • The problem doesn't happen when SDL is linked directly (WITH_SDL_DYNLOAD=OFF).
  • The problem doesn't happen with USD disabled (WITH_USD=OFF).
  • The problem occurs even when WITH_PULSEAUDIO=OFF (where pulse-audio is only activated via SDL).

Details:

  • Checking the library symbols, it doesn't seem as if there are conflicts between (libusd_ms.so & libSDL.so).

  • Hanging occurs when linking with libusd_ms.so (bundled libraries and arch-linux's usd package).

  • Even with WITH_USD=OFF the hang occurs when manually linking libusd_ms.so:

diff --git a/source/creator/CMakeLists.txt b/source/creator/CMakeLists.txt
index 95a1aa2f73d..1055bb1a433 100644
--- a/source/creator/CMakeLists.txt
+++ b/source/creator/CMakeLists.txt
@@ -1707,7 +1707,7 @@ endif()
 # Setup link libraries
 
 add_dependencies(blender makesdna)
-target_link_libraries(blender PRIVATE ${LIB})
+target_link_libraries(blender PRIVATE ${LIB} "/usr/lib/libusd_ms.so")
 unset(LIB)
 
 setup_platform_linker_flags(blender)
  • The issue occurs with the libSDL.so from Arch Linux as well as a build (SDL's SDL2 branch 4eac44bed446f1ce9083b765dd9744b8ca81497e).

  • From adding break-points to pulse-audio's initialization functions. It doesn't seem that linking libusd_ms.so causes additional function calls, linking it for some reason causes the hang.

  • When linking libusd, initializing SDL & pulseaudio hangs. The second thread may be related.

#0  0x00007ffff2cb0c11 in __futex_abstimed_wait_common64 (futex_word=0x83dc4a0, expected=0, op=393, abstime=0x0, private=<optimized out>, cancel=true)
    at futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x83dc4a0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0,
    private=<optimized out>, cancel=cancel@entry=true) at futex-internal.c:87
#2  0x00007ffff2cb0c83 in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x83dc4a0, expected=expected@entry=0, clockid=clockid@entry=0,
    abstime=abstime@entry=0x0, private=<optimized out>) at futex-internal.c:139
#3  0x00007ffff2cbb5cc in do_futex_wait (sem=sem@entry=0x83dc4a0, clockid=clockid@entry=0, abstime=abstime@entry=0x0) at /scratch/src/glibc/nptl/sem_waitcommon.c:111
#4  0x00007ffff2cbb657 in __new_sem_wait_slow64 (sem=sem@entry=0x83dc4a0, clockid=clockid@entry=0, abstime=abstime@entry=0x0)
    at /scratch/src/glibc/nptl/sem_waitcommon.c:183
#5  0x00007ffff2cbb6c4 in __new_sem_wait (sem=0x83dc4a0) at sem_wait.c:42
#6  0x00007fffd145e436 in SDL_SemWait_REAL () from /opt/sdl/lib/libSDL2.so
#7  0x00007fffd13ef621 in PULSEAUDIO_DetectDevices () from /opt/sdl/lib/libSDL2.so
#8  0x00007fffd123044e in SDL_AudioInit_REAL () from /opt/sdl/lib/libSDL2.so
#9  0x00007fffd122a367 in SDL_InitSubSystem_REAL () from /opt/sdl/lib/libSDL2.so
#10 0x00007fffd1231a7d in SDL_OpenAudio_REAL () from /opt/sdl/lib/libSDL2.so
#11 0x00007fffd124f03e in SDL_OpenAudio () from /opt/sdl/lib/libSDL2.so
#12 0x0000000004374d90 in aud::SDLDevice::SDLDevice (this=0x8416660, specs=..., buffersize=2048) at /src/blender/extern/audaspace/plugins/sdl/SDLDevice.cpp:88
#13 0x0000000004375249 in aud::SDLDeviceFactory::openDevice (this=0x7fd89c0) at /src/blender/extern/audaspace/plugins/sdl/SDLDevice.cpp:142
#14 0x0000000004354376 in AUD_init (device=<optimized out>, specs=..., buffersize=2048, name=0x187cbf1 "Blender")
    at /src/blender/extern/audaspace/bindings/C/AUD_Special.cpp:396
#15 0x0000000001f6eace in BKE_sound_init (bmain=0x847e8e8) at /src/blender/source/blender/blenkernel/intern/sound.cc:417
#16 0x00000000021e6ba8 in wm_init_userdef (bmain=0x847e8e8) at /src/blender/source/blender/windowmanager/intern/wm_files.cc:520
#17 wm_homefile_read_ex (C=C@entry=0x7b1d2b8, params_homefile=params_homefile@entry=0x7fffffffe250, reports=reports@entry=0x0,
    r_params_file_read_post=r_params_file_read_post@entry=0x7fffffffe248) at /src/blender/source/blender/windowmanager/intern/wm_files.cc:1511
#18 0x00000000021eeca8 in WM_init (C=C@entry=0x7b1d2b8, argc=argc@entry=1, argv=argv@entry=0x7fffffffe428)
    at /src/blender/source/blender/windowmanager/intern/wm_init_exit.cc:279
#19 0x0000000001b0ceb0 in main (argc=1, argv=0x7fffffffe428) at /src/blender/source/creator/creator.cc:535
#0  0x00007ffff2d1bc0d in __GI___poll (fds=0x7fff340071a0, nfds=3, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007fffd106b0ad in poll_func (ufds=0x7fff340071a0, nfds=3, timeout=-1, userdata=0x842ace0) at ../src/pulse/thread-mainloop.c:70
#2  0x00007fffd1056290 in pa_mainloop_poll (m=0x83b89f0) at ../src/pulse/mainloop.c:863
#3  0x00007fffd1056695 in pa_mainloop_iterate (m=0x83b89f0, block=1, retval=0x0) at ../src/pulse/mainloop.c:945
#4  0x00007fffd1056709 in pa_mainloop_run (m=0x83b89f0, retval=0x0) at ../src/pulse/mainloop.c:963
#5  0x00007fffd106b1b4 in thread (userdata=0x83c4720) at ../src/pulse/thread-mainloop.c:101
#6  0x00007fffcdad71df in internal_thread_func (userdata=0x83b8af0) at ../src/pulsecore/thread-posix.c:86
#7  0x00007ffff2cb3dba in start_thread (arg=<optimized out>) at pthread_create.c:447
#8  0x00007ffff2d28c08 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Update

Some additional tests (without success).

  • Building a "minimal" libusd_ms.so still has the problem:
 > python build_scripts/build_usd.py /opt/usd --no-python --build-monolithic --no-imaging --no-openvdb --no-embree --no-prman --no-draco --no-materialx

Building with settings:
  USD source directory          /scratch/src/OpenUSD
  USD install directory         /opt/usd
  3rd-party source directory    /opt/usd/src
  3rd-party install directory   /opt/usd
  Build directory               /opt/usd/build
  CMake generator               Default
  CMake toolset                 Default
  Downloader                    curl

  Building                      Monolithic shared library
    Variant                     Release
    Target                      
    Imaging                     Off
      Ptex support:             Off
      OpenVDB support:          Off
      OpenImageIO support:      Off 
      OpenColorIO support:      Off 
      PRMan support:            Off
    UsdImaging                  Off
      usdview:                  Off
    MaterialX support           Off
    Python support              Off
      Python Debug:             Off
      Python docs:              Off
    Documentation               Off
    Tests                       Off
      Mayapy Tests:             Off
      AnimX Tests:              Off
    Examples                    On
    Tutorials                   On
    Tools                       On
    Alembic Plugin              Off
      HDF5 support:             Off
    Draco Plugin                Off

  Dependencies                  None
STATUS: Installing USD...

Success! To use USD, please ensure that you have:

    The following in your PATH environment variable:
    /opt/usd/bin
  • Renaming all the non-C++ names libusd_ms.so defines doesn't resolve the issue:

Generate a list of non C++ names:

nm -D /usr/lib/libusd_ms.so | cut -d" " -f 3- | grep -e '^[[:space:]]' -v | grep -e "^_Z" -v

Run with attached patchelf.map.

#!/bin/sh
patchelf --output libusd_ms.so --rename-dynamic-symbols patchelf.map libusd_ms.so.orig

This also didn't solve the problem, so it seems likely the problem is caused by logic that executes when the library is loaded instead of being a symbol conflict (SDL and Pulse are C only so C++ symbols shouldn't conflict).

This issue happens on my system, I tried to get to the bottom of this but only managed to narrow this down to a conflict between SDL & USD. Here are some findings: - The problem doesn't happen when SDL is linked directly (`WITH_SDL_DYNLOAD=OFF`). - The problem doesn't happen with USD disabled (`WITH_USD=OFF`). - The problem occurs even when `WITH_PULSEAUDIO=OFF` (where pulse-audio is only activated via SDL). Details: - Checking the library symbols, it doesn't seem as if there are conflicts between (`libusd_ms.so` & `libSDL.so`). - Hanging occurs when linking with `libusd_ms.so` (bundled libraries and arch-linux's `usd` package). - Even with `WITH_USD=OFF` the hang occurs when manually linking `libusd_ms.so`: ```.diff diff --git a/source/creator/CMakeLists.txt b/source/creator/CMakeLists.txt index 95a1aa2f73d..1055bb1a433 100644 --- a/source/creator/CMakeLists.txt +++ b/source/creator/CMakeLists.txt @@ -1707,7 +1707,7 @@ endif() # Setup link libraries add_dependencies(blender makesdna) -target_link_libraries(blender PRIVATE ${LIB}) +target_link_libraries(blender PRIVATE ${LIB} "/usr/lib/libusd_ms.so") unset(LIB) setup_platform_linker_flags(blender) ``` - The issue occurs with the libSDL.so from Arch Linux as well as a build (SDL's `SDL2` branch `4eac44bed446f1ce9083b765dd9744b8ca81497e`). - From adding break-points to pulse-audio's initialization functions. It doesn't seem that linking `libusd_ms.so` causes additional function calls, linking it for some reason causes the hang. - When linking libusd, initializing SDL & pulseaudio hangs. The second thread may be related. ``` #0 0x00007ffff2cb0c11 in __futex_abstimed_wait_common64 (futex_word=0x83dc4a0, expected=0, op=393, abstime=0x0, private=<optimized out>, cancel=true) at futex-internal.c:57 #1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x83dc4a0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>, cancel=cancel@entry=true) at futex-internal.c:87 #2 0x00007ffff2cb0c83 in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x83dc4a0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at futex-internal.c:139 #3 0x00007ffff2cbb5cc in do_futex_wait (sem=sem@entry=0x83dc4a0, clockid=clockid@entry=0, abstime=abstime@entry=0x0) at /scratch/src/glibc/nptl/sem_waitcommon.c:111 #4 0x00007ffff2cbb657 in __new_sem_wait_slow64 (sem=sem@entry=0x83dc4a0, clockid=clockid@entry=0, abstime=abstime@entry=0x0) at /scratch/src/glibc/nptl/sem_waitcommon.c:183 #5 0x00007ffff2cbb6c4 in __new_sem_wait (sem=0x83dc4a0) at sem_wait.c:42 #6 0x00007fffd145e436 in SDL_SemWait_REAL () from /opt/sdl/lib/libSDL2.so #7 0x00007fffd13ef621 in PULSEAUDIO_DetectDevices () from /opt/sdl/lib/libSDL2.so #8 0x00007fffd123044e in SDL_AudioInit_REAL () from /opt/sdl/lib/libSDL2.so #9 0x00007fffd122a367 in SDL_InitSubSystem_REAL () from /opt/sdl/lib/libSDL2.so #10 0x00007fffd1231a7d in SDL_OpenAudio_REAL () from /opt/sdl/lib/libSDL2.so #11 0x00007fffd124f03e in SDL_OpenAudio () from /opt/sdl/lib/libSDL2.so #12 0x0000000004374d90 in aud::SDLDevice::SDLDevice (this=0x8416660, specs=..., buffersize=2048) at /src/blender/extern/audaspace/plugins/sdl/SDLDevice.cpp:88 #13 0x0000000004375249 in aud::SDLDeviceFactory::openDevice (this=0x7fd89c0) at /src/blender/extern/audaspace/plugins/sdl/SDLDevice.cpp:142 #14 0x0000000004354376 in AUD_init (device=<optimized out>, specs=..., buffersize=2048, name=0x187cbf1 "Blender") at /src/blender/extern/audaspace/bindings/C/AUD_Special.cpp:396 #15 0x0000000001f6eace in BKE_sound_init (bmain=0x847e8e8) at /src/blender/source/blender/blenkernel/intern/sound.cc:417 #16 0x00000000021e6ba8 in wm_init_userdef (bmain=0x847e8e8) at /src/blender/source/blender/windowmanager/intern/wm_files.cc:520 #17 wm_homefile_read_ex (C=C@entry=0x7b1d2b8, params_homefile=params_homefile@entry=0x7fffffffe250, reports=reports@entry=0x0, r_params_file_read_post=r_params_file_read_post@entry=0x7fffffffe248) at /src/blender/source/blender/windowmanager/intern/wm_files.cc:1511 #18 0x00000000021eeca8 in WM_init (C=C@entry=0x7b1d2b8, argc=argc@entry=1, argv=argv@entry=0x7fffffffe428) at /src/blender/source/blender/windowmanager/intern/wm_init_exit.cc:279 #19 0x0000000001b0ceb0 in main (argc=1, argv=0x7fffffffe428) at /src/blender/source/creator/creator.cc:535 ``` ``` #0 0x00007ffff2d1bc0d in __GI___poll (fds=0x7fff340071a0, nfds=3, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 #1 0x00007fffd106b0ad in poll_func (ufds=0x7fff340071a0, nfds=3, timeout=-1, userdata=0x842ace0) at ../src/pulse/thread-mainloop.c:70 #2 0x00007fffd1056290 in pa_mainloop_poll (m=0x83b89f0) at ../src/pulse/mainloop.c:863 #3 0x00007fffd1056695 in pa_mainloop_iterate (m=0x83b89f0, block=1, retval=0x0) at ../src/pulse/mainloop.c:945 #4 0x00007fffd1056709 in pa_mainloop_run (m=0x83b89f0, retval=0x0) at ../src/pulse/mainloop.c:963 #5 0x00007fffd106b1b4 in thread (userdata=0x83c4720) at ../src/pulse/thread-mainloop.c:101 #6 0x00007fffcdad71df in internal_thread_func (userdata=0x83b8af0) at ../src/pulsecore/thread-posix.c:86 #7 0x00007ffff2cb3dba in start_thread (arg=<optimized out>) at pthread_create.c:447 #8 0x00007ffff2d28c08 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 ``` ---- ### Update Some additional tests (without success). - Building a "minimal" `libusd_ms.so` still has the problem: ``` > python build_scripts/build_usd.py /opt/usd --no-python --build-monolithic --no-imaging --no-openvdb --no-embree --no-prman --no-draco --no-materialx Building with settings: USD source directory /scratch/src/OpenUSD USD install directory /opt/usd 3rd-party source directory /opt/usd/src 3rd-party install directory /opt/usd Build directory /opt/usd/build CMake generator Default CMake toolset Default Downloader curl Building Monolithic shared library Variant Release Target Imaging Off Ptex support: Off OpenVDB support: Off OpenImageIO support: Off OpenColorIO support: Off PRMan support: Off UsdImaging Off usdview: Off MaterialX support Off Python support Off Python Debug: Off Python docs: Off Documentation Off Tests Off Mayapy Tests: Off AnimX Tests: Off Examples On Tutorials On Tools On Alembic Plugin Off HDF5 support: Off Draco Plugin Off Dependencies None STATUS: Installing USD... Success! To use USD, please ensure that you have: The following in your PATH environment variable: /opt/usd/bin ``` - Renaming all the non-C++ names `libusd_ms.so` defines doesn't resolve the issue: Generate a list of non C++ names: ``` nm -D /usr/lib/libusd_ms.so | cut -d" " -f 3- | grep -e '^[[:space:]]' -v | grep -e "^_Z" -v ``` Run with attached `patchelf.map`. ``` #!/bin/sh patchelf --output libusd_ms.so --rename-dynamic-symbols patchelf.map libusd_ms.so.orig ``` This also didn't solve the problem, so it seems likely the problem is caused by logic that executes when the library is loaded instead of being a symbol conflict (SDL and Pulse are C only so C++ symbols shouldn't conflict).

@deadpin @makowalski would you know if libusd_ms.so interacts with the sound systems (SDL and/or pulse) in any way?

@deadpin @makowalski would you know if `libusd_ms.so` interacts with the sound systems (SDL and/or pulse) in any way?
Bastien Montagne added the
Interest
USD
label 2024-08-27 11:11:27 +02:00

@deadpin @makowalski would you know if libusd_ms.so interacts with the sound systems (SDL and/or pulse) in any way?

Not that I'm aware, but I don't know for sure. For what it's worth, there is no obvious dependency that I can see in the USD library code base, but I haven't done an exhaustive search. I'll report back if anything occurs to me.

> @deadpin @makowalski would you know if `libusd_ms.so` interacts with the sound systems (SDL and/or pulse) in any way? Not that I'm aware, but I don't know for sure. For what it's worth, there is no obvious dependency that I can see in the USD library code base, but I haven't done an exhaustive search. I'll report back if anything occurs to me.

I also don't believe USD will use or touch (directly) SDL or Pulse. But USD does have quite a bit of code they execute on library load.

Everything that is defined as a ARCH_CONSTRUCTOR in their source is potentially executed. On inspection I don't see anything that would obviously impact external entities though.

If I had to guess at where to start, maybe try commenting out the code inside the following 2 locations to see if it can at least get past the deadlock (USD would not function correctly but hopefully it gets past the deadlock to know for sure):
<usd path>/pxr/base/arch/initConfig.cpp -- the code inside ARCH_CONSTRUCTOR(Arch_InitConfig, 2, void) (need to keep at least the call to Arch_InitTmpDir though)
<usd path>/pxr/base/plug/initConfig.cpp -- the code inside ARCH_CONSTRUCTOR(Plug_InitConfig, 2, void)

I also don't believe USD will use or touch (directly) SDL or Pulse. But USD does have quite a bit of code they execute on library load. Everything that is defined as a `ARCH_CONSTRUCTOR` in their source is potentially executed. On inspection I don't see anything that would obviously impact external entities though. If I had to guess at where to start, maybe try commenting out the code inside the following 2 locations to see if it can at least get past the deadlock (USD would not function correctly but hopefully it gets past the deadlock to know for sure): `<usd path>/pxr/base/arch/initConfig.cpp` -- the code inside `ARCH_CONSTRUCTOR(Arch_InitConfig, 2, void)` (need to keep at least the call to `Arch_InitTmpDir` though) `<usd path>/pxr/base/plug/initConfig.cpp` -- the code inside `ARCH_CONSTRUCTOR(Plug_InitConfig, 2, void)`

@deadpin

Tried early returning from every ARCH_CONSTRUCTOR, but the issue remains.


Tried building non-monolithic libraries, the issue remains with the following libraries.

  • libtbb_debug.so

  • libtbbmalloc_debug.so

  • libtbbmalloc_proxy_debug.so

  • libtbbmalloc_proxy.so

  • libtbbmalloc.so

  • libtbb.so

  • libusd_arch.so

  • libusd_ar.so

  • libusd_gf.so

  • libusd_js.so

  • libusd_kind.so

  • libusd_ndr.so

  • libusd_pcp.so

  • libusd_pegtl.so

  • libusd_plug.so

  • libusd_sdf.so

  • libusd_sdr.so

  • libusd_tf.so

  • libusd_trace.so

  • libusd_ts.so

  • libusd_usdGeom.so

  • libusd_usdHydra.so

  • libusd_usdLux.so

  • libusd_usdMedia.so

  • libusd_usdPhysics.so

  • libusd_usdProc.so

  • libusd_usdRender.so

  • libusd_usdRi.so

  • libusd_usdShade.so

  • libusd_usdSkel.so

  • libusd_usd.so

  • libusd_usdUI.so

  • libusd_usdUtils.so

  • libusd_usdVol.so

  • libusd_vt.so

  • libusd_work.so

@deadpin Tried early returning from every `ARCH_CONSTRUCTOR`, but the issue remains. ---- Tried building non-monolithic libraries, the issue remains with the following libraries. - [ ] `libtbb_debug.so` - [ ] `libtbbmalloc_debug.so` - [ ] `libtbbmalloc_proxy_debug.so` - [ ] `libtbbmalloc_proxy.so` - [ ] `libtbbmalloc.so` - [ ] `libtbb.so` - [ ] `libusd_arch.so` - [ ] `libusd_ar.so` - [ ] `libusd_gf.so` - [ ] `libusd_js.so` - [ ] `libusd_kind.so` - [X] `libusd_ndr.so` - [X] `libusd_pcp.so` - [ ] `libusd_pegtl.so` - [ ] `libusd_plug.so` - [X] `libusd_sdf.so` - [X] `libusd_sdr.so` - [ ] `libusd_tf.so` - [ ] `libusd_trace.so` - [ ] `libusd_ts.so` - [X] `libusd_usdGeom.so` - [X] `libusd_usdHydra.so` - [X] `libusd_usdLux.so` - [X] `libusd_usdMedia.so` - [X] `libusd_usdPhysics.so` - [X] `libusd_usdProc.so` - [X] `libusd_usdRender.so` - [X] `libusd_usdRi.so` - [X] `libusd_usdShade.so` - [X] `libusd_usdSkel.so` - [X] `libusd_usd.so` - [X] `libusd_usdUI.so` - [X] `libusd_usdUtils.so` - [X] `libusd_usdVol.so` - [ ] `libusd_vt.so` - [X] `libusd_work.so`

The issue seems to be that SDL tries and fails to create a thread with a stack size of 256 * 1024 bytes in PULSEAUDIO_DetectDevices; that thread is then supposed to signal the semaphore on which SDL hangs. The thread creation fails because the requested stack is too small to fit the thread descriptor. It doesn't fit because USD has a large amount of thread local storage, which is stored on-stack in the thread descriptor for non-main threads.
The total required TLS space is 290,560 bytes, of which 278,968 is from libusd_ms.so, readelf -Wl libusd_ms.so:

...
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
...
  TLS            0x2b56790 0x0000000002b57790 0x0000000002b57790 0x000000 >0x0441b8< R   0x10
...

Increasing the stack size in PULSEAUDIO_DetectDevices from 256 * 1024 to 512 * 1024 allows Blender to launch, although maybe USD shouldn't use so much TLS space.

The issue seems to be that SDL tries and fails to create a thread with a stack size of `256 * 1024` bytes in `PULSEAUDIO_DetectDevices`; that thread is then supposed to signal the semaphore on which SDL hangs. The thread creation fails because the requested stack is too small to fit the thread descriptor. It doesn't fit because USD has a large amount of thread local storage, which is stored on-stack in the thread descriptor for non-main threads. The total required TLS space is 290,560 bytes, of which 278,968 is from `libusd_ms.so`, `readelf -Wl libusd_ms.so`: ``` ... Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align ... TLS 0x2b56790 0x0000000002b57790 0x0000000002b57790 0x000000 >0x0441b8< R 0x10 ... ``` Increasing the stack size in `PULSEAUDIO_DetectDevices` from `256 * 1024` to `512 * 1024` allows Blender to launch, although maybe USD shouldn't use so much TLS space.

Excellent find there Jorn. Thank you for investigating the root cause.

I suppose there's a few follow ups then:

  • Regardless of our decision to keep/remove SDL, I've filed an upstream issue on SDL to let them know that they need to check their thread creation call-sites and to be aware of the stack size that apps might see [1]
  • If Blender chooses to keep SDL, we can patch our build of the libraries locally
  • USD's stack space most likely comes from their Prim path cache which is like 256kb big. They are really adamant about using such things but I can ask if they're willing to heap alloc instead of using the stack [2]. I haven't filed the issue for them yet though.

[1] https://github.com/libsdl-org/SDL/issues/10806
[2] https://github.com/PixarAnimationStudios/OpenUSD/blob/release/pxr/usd/sdf/path.cpp#L819

Excellent find there Jorn. Thank you for investigating the root cause. I suppose there's a few follow ups then: - Regardless of our decision to keep/remove SDL, I've filed an upstream issue on SDL to let them know that they need to check their thread creation call-sites and to be aware of the stack size that apps might see [1] - If Blender chooses to keep SDL, we can patch our build of the libraries locally - USD's stack space most likely comes from their `Prim path cache` which is like 256kb big. They are really adamant about using such things but I can ask if they're willing to heap alloc instead of using the stack [2]. I haven't filed the issue for them yet though. [1] https://github.com/libsdl-org/SDL/issues/10806 [2] https://github.com/PixarAnimationStudios/OpenUSD/blob/release/pxr/usd/sdf/path.cpp#L819

@jorn interesting, out of curiosity - how did you manage to find this was caused by an issue with the stack size?

@deadpin nice to see the issue has been resolved in SDL2 already.

Suggest to file an issue with USD to heap allocate Prim path cache since this seems like an issue that could bite us again in some other unrelated situations.

@jorn interesting, out of curiosity - how did you manage to find this was caused by an issue with the stack size? @deadpin nice to see the issue has been resolved in SDL2 already. Suggest to file an issue with USD to heap allocate `Prim path cache` since this seems like an issue that could bite us again in some other unrelated situations.

@ideasman42 The backtraces showed that the main thread was stuck on a semaphore in SDL. Putting a breakpoint at the start of the function that's supposed to signal the semaphore, HotplugThread, showed that it was not getting called. I then stepped into SDL_CreateThreadInternal in PULSEAUDIO_DetectDevices to find out why, where in the pthread implementation of SDL_SYS_CreateThread it returned an error after calling pthread_create, which returned EINVAL:

	/* Create the thread and go! */
    if (pthread_create(&thread->handle, &type, RunThread, thread) != 0) {
        return SDL_SetError("Not enough resources to create thread");
    }

After a few tries stepping into pthread_create and some instruction stepping showed it returning EINVAL in allocate_stack here:

      if (__builtin_expect (size < ((guardsize + tls_static_size_for_stack
                                     + MINIMAL_REST_STACK + pagesize_m1)
                                    & ~pagesize_m1),
                            0))
        /* The stack is too small (or the guard too large).  */
        return EINVAL;

This check failed because tls_static_size_for_stack was very large, with the requested stack size, size, being too small to fit everything. I then found A Deep dive into (implicit) Thread Local Storage which explained that 'static' TLS was allocated on the stack for non-main threads. Using readelf confirmed it was primarily USD that made tls_static_size_for_stack so big.

The reason that the hang doesn't happen with WITH_SDL_DYNLOAD=OFF is that the version of SDL2 in lib (2.28.2) is from before the semaphore in PULSEAUDIO_DetectDevices was added (2.30.0, commit), but HotplugThread still fails to run for the same reason as above. See repology for which distros have this version or newer.

@ideasman42 The backtraces showed that the main thread was stuck on a semaphore in SDL. Putting a breakpoint at the start of the function that's supposed to signal the semaphore, `HotplugThread`, showed that it was not getting called. I then stepped into `SDL_CreateThreadInternal` in `PULSEAUDIO_DetectDevices` to find out why, where in the pthread implementation of [`SDL_SYS_CreateThread`](https://github.com/libsdl-org/SDL/blob/release-2.30.7/src/thread/pthread/SDL_systhread.c#L115-L118) it returned an error after calling `pthread_create`, which returned EINVAL: ```C /* Create the thread and go! */ if (pthread_create(&thread->handle, &type, RunThread, thread) != 0) { return SDL_SetError("Not enough resources to create thread"); } ``` After a few tries stepping into `pthread_create` and some instruction stepping showed it returning EINVAL in [`allocate_stack`](https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/allocatestack.c;h=2cb562f8eac8af11393ebd9ae36b111c7bca4191;hb=c9154cad66aa0b11ede62cc9190d3485c5ef6941#l351) here: ```C if (__builtin_expect (size < ((guardsize + tls_static_size_for_stack + MINIMAL_REST_STACK + pagesize_m1) & ~pagesize_m1), 0)) /* The stack is too small (or the guard too large). */ return EINVAL; ``` This check failed because `tls_static_size_for_stack` was very large, with the requested stack size, `size`, being too small to fit everything. I then found [A Deep dive into (implicit) Thread Local Storage](https://chao-tic.github.io/blog/2018/12/25/tls) which explained that 'static' TLS was allocated on the stack for non-main threads. Using readelf confirmed it was primarily USD that made `tls_static_size_for_stack` so big. The reason that the hang doesn't happen with `WITH_SDL_DYNLOAD=OFF` is that the version of SDL2 in `lib` (2.28.2) is from before the semaphore in `PULSEAUDIO_DetectDevices` was added (2.30.0, [commit](https://github.com/libsdl-org/SDL/commit/b9d16dac4ec60a76099865bbb8ed7fe5909054de)), but `HotplugThread` still fails to run for the same reason as above. See [repology](https://repology.org/project/sdl2/versions) for which distros have this version or newer.

@jorn thanks for the detailed explanation, it may help when investigating similar issues in the future.

@jorn thanks for the detailed explanation, it may help when investigating similar issues in the future.

Dynamic SDL loading has been removed & SDL disabled for official releases making this particular bug no longer applicable: c6afb0e270.

See https://devtalk.blender.org/t/sdl-support-to-be-disabled-in-release-builds/36564

Closing.

Dynamic SDL loading has been removed & SDL disabled for official releases making this particular bug no longer applicable: c6afb0e270362284b6f8edbf8a93cbd98d198560. See https://devtalk.blender.org/t/sdl-support-to-be-disabled-in-release-builds/36564 Closing.
Blender Bot added
Status
Archived
and removed
Status
Confirmed
labels 2024-09-13 15:25:37 +02:00

Blender is the best. :-) I take it the next official release should be free of this issue, then?

Thanks everyone!

Blender is the best. :-) I take it the next official release should be free of this issue, then? Thanks everyone!

@clepsydrae yes, however this was done by removing SDL support as we didn't have a compelling reason to keep it - given the alternatives we support.

@clepsydrae yes, however this was done by removing SDL support as we didn't have a compelling reason to keep it - given the alternatives we support.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
9 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#126661
No description provided.