Blender 2.90 dailies crash on startup since Thursday 4 June #77549

Closed
opened 2020-06-07 01:51:19 +02:00 by Richard J Walker · 22 comments

System Information
Operating system: Linux-5.6.14-desktop-2.mga7-x86_64-with-mageia-7-Official 64 Bits
Graphics card: AMD KAVERI (DRM 3.36.0, 5.6.14-desktop-2.mga7, LLVM 8.0.0) X.Org 4.6 (Core Profile) Mesa 20.0.7
GPU rendering on ‎GP107 [GeForce GTX 1050 Ti], NVIDIA GeForce 635 series and later, driver v 430.64 and CUDA toolkit 10.1.168

Blender Version
Broken: version: 2.90.0 Alpha, branch: master, commit date: 2020-06-06 14:18, hash 'aed11c673efe'
Broken: version: 2.90.0 Alpha, branch: master, commit date: 2020-06-04 22:58, hash 'c7329da14b22'
Worked: 2.90.0 Alpha, branch: master, commit date: 2020-06-03 18:45, hash: b94ab93dfb

Short description of error
Blender 2.90 crashes before splash screen since 4 June daily build

Exact steps for others to reproduce the error
Fetch and extract Blender 2.90 daily build after 3 June
Remove or rename ~/.config/blender/2.90 directory to hide add-on and other user-configured features and to enable a factory-default startup.
Execute extracted blender. Blender draws a blank window and crashes to desktop.
Repeat execution with --debug-all switch and note that the crash happens during the sequence of shader file writes to disc in the penultimate menu draw; UI_menutype_draw: opening menu "VIEW3D_MT_editor_menus".

It gets to /tmp/blender_c42QeB/0060.vert and /tmp/blender_c42QeB/0060.frag but fails to write the last three elements for 0061.vert, 0061.frag and 0061.geom and thus does not reach the "WM_MT_splash" menu section.

This problem shares some characteristics with #77505 and #77374 but is concerned only with 2.90. All previous builds of 2.83 up to and including 2.83 LTS are fully functional.

Attached blender.crash.txt files:
blender-c7329da14b22.crash.txt

blender-aed11c673efe.crash.txt

There are two crash files identified by the release hash in the filename.

Attached console debug files:
blender-c7329da14b22.run.txt

blender-b94ab93dfb82.run.txt

blender-aed11c673efe.run.txt

There are three console files, two corresponding with the two broken 2.90 dailies and one for the last successful 2.90 daily, also identified by the release hash in the filename.

[Based on the default startup]

**System Information** Operating system: Linux-5.6.14-desktop-2.mga7-x86_64-with-mageia-7-Official 64 Bits Graphics card: AMD KAVERI (DRM 3.36.0, 5.6.14-desktop-2.mga7, LLVM 8.0.0) X.Org 4.6 (Core Profile) Mesa 20.0.7 GPU rendering on ‎GP107 [GeForce GTX 1050 Ti], NVIDIA GeForce 635 series and later, driver v 430.64 and CUDA toolkit 10.1.168 **Blender Version** Broken: version: 2.90.0 Alpha, branch: master, commit date: 2020-06-06 14:18, hash 'aed11c673efe' Broken: version: 2.90.0 Alpha, branch: master, commit date: 2020-06-04 22:58, hash 'c7329da14b22' Worked: 2.90.0 Alpha, branch: master, commit date: 2020-06-03 18:45, hash: `b94ab93dfb` **Short description of error** Blender 2.90 crashes before splash screen since 4 June daily build **Exact steps for others to reproduce the error** Fetch and extract Blender 2.90 daily build after 3 June Remove or rename ~/.config/blender/2.90 directory to hide add-on and other user-configured features and to enable a factory-default startup. Execute extracted blender. Blender draws a blank window and crashes to desktop. Repeat execution with --debug-all switch and note that the crash happens during the sequence of shader file writes to disc in the penultimate menu draw; UI_menutype_draw: opening menu "VIEW3D_MT_editor_menus". It gets to /tmp/blender_c42QeB/0060.vert and /tmp/blender_c42QeB/0060.frag but fails to write the last three elements for 0061.vert, 0061.frag and 0061.geom and thus does not reach the "WM_MT_splash" menu section. This problem shares some characteristics with #77505 and #77374 but is concerned only with 2.90. All previous builds of 2.83 up to and including 2.83 LTS are fully functional. Attached blender.crash.txt files: [blender-c7329da14b22.crash.txt](https://archive.blender.org/developer/F8588070/blender-c7329da14b22.crash.txt) [blender-aed11c673efe.crash.txt](https://archive.blender.org/developer/F8588069/blender-aed11c673efe.crash.txt) There are two crash files identified by the release hash in the filename. Attached console debug files: [blender-c7329da14b22.run.txt](https://archive.blender.org/developer/F8588073/blender-c7329da14b22.run.txt) [blender-b94ab93dfb82.run.txt](https://archive.blender.org/developer/F8588072/blender-b94ab93dfb82.run.txt) [blender-aed11c673efe.run.txt](https://archive.blender.org/developer/F8588071/blender-aed11c673efe.run.txt) There are three console files, two corresponding with the two broken 2.90 dailies and one for the last successful 2.90 daily, also identified by the release hash in the filename. [Based on the default startup]

Added subscriber: @jaywalker

Added subscriber: @jaywalker
Member

Added subscriber: @ankitm

Added subscriber: @ankitm
Member

Changed status from 'Needs Triage' to: 'Needs User Info'

Changed status from 'Needs Triage' to: 'Needs User Info'
Member

I don't have the system so I cannot redo it. But since there's a narrow range of working and broken builds, and enough logs, I can mark it confirmed if you could retry after updating the gpu drivers.

I don't have the system so I cannot redo it. But since there's a narrow range of working and broken builds, and enough logs, I can mark it confirmed if you could retry after updating the gpu drivers.

Hi Ankit, thank you for your response. At first glance it looks like it could take quite some time to install updates for my system's display driver as there is no more recent kernel available for my Mageia 7 installations and I am, of course, using the kernel's amdgpu module. All the OpenGL stuff comes from Mesa and that is currently at 20.1.0 (as of 27th May). The Mageia release for Mageia 7 is at 20.0.7 so it may not be long before we get an update, but I am not sure if I can quickly "jump the gun" and try building it locally.

The other possibility is to upgrade one system to the current Mageia 8 development branch (Cauldron) but it is not yet entirely trouble-free for upgrading. It would also mean replacing every system on the PC which may blur the picture of which change made which alteration of behaviour (if any).

I will be happy enough to have a go if you think there are any relevant updates in the Kaveri drivers which would accommodate the recent change made to Blender's system display requirements.

Hi Ankit, thank you for your response. At first glance it looks like it could take quite some time to install updates for my system's display driver as there is no more recent kernel available for my Mageia 7 installations and I am, of course, using the kernel's amdgpu module. All the OpenGL stuff comes from Mesa and that is currently at 20.1.0 (as of 27th May). The Mageia release for Mageia 7 is at 20.0.7 so it may not be long before we get an update, but I am not sure if I can quickly "jump the gun" and try building it locally. The other possibility is to upgrade one system to the current Mageia 8 development branch (Cauldron) but it is not yet entirely trouble-free for upgrading. It would also mean replacing every system on the PC which may blur the picture of which change made which alteration of behaviour (if any). I will be happy enough to have a go if you think there are any relevant updates in the Kaveri drivers which would accommodate the recent change made to Blender's system display requirements.
Member

Changed status from 'Needs User Info' to: 'Confirmed'

Changed status from 'Needs User Info' to: 'Confirmed'
Member

Marking high since it's a recent change

Marking high since it's a recent change
Member

Added subscribers: @fclem, @LazyDodo

Added subscribers: @fclem, @LazyDodo
Member

Given the stack trace, i'd be surprised if it is not b168c255aa probably best if @fclem takes a look here

Given the stack trace, i'd be surprised if it is not b168c255aae848f730a12b9cb16e88681a0c6809 probably best if @fclem takes a look here

Hey, Ray, good guess! I followed Pablo Vasquez's video on building Blender and reverted the minor changes in source/blender/gpu/intern/gpu_texture.c

Blender 2.90 now starts up as normal. Using the blend file I had been working on Blender continues to function as expected using viewport, look-dev and cycles shading in the 3D viewport. Image rendering was also as expected, though I did have to wait a minute or so for the first time compiling of the CUDA kernel. The test machine uses a GTX 960 for rendering so I anticipate no issues on the other machine with the 1050 Ti. Both machines use the same motherboard with the same AMD A10 Kaveri APU.

Hey, Ray, good guess! I followed Pablo Vasquez's video on building Blender and reverted the minor changes in source/blender/gpu/intern/gpu_texture.c Blender 2.90 now starts up as normal. Using the blend file I had been working on Blender continues to function as expected using viewport, look-dev and cycles shading in the 3D viewport. Image rendering was also as expected, though I did have to wait a minute or so for the first time compiling of the CUDA kernel. The test machine uses a GTX 960 for rendering so I anticipate no issues on the other machine with the 1050 Ti. Both machines use the same motherboard with the same AMD A10 Kaveri APU.

@jaywalker can you test this simple patch?

@@ -1770,16 +1770,16 @@ void GPU_texture_unbind(GPUTexture *tex)
 }
 
 void GPU_texture_unbind_all(void)
 {
   if (GLEW_ARB_multi_bind) {
-    glBindTextures(0, GPU_max_textures(), NULL);
-    glBindSamplers(0, GPU_max_textures(), NULL);
+    glBindTextures(0, GPU_max_textures_frag(), NULL);
+    glBindSamplers(0, GPU_max_textures_frag(), NULL);
     return;
   }
 
-  for (int i = 0; i < GPU_max_textures(); i++) {
+  for (int i = 0; i < GPU_max_textures_frag(); i++) {
     glActiveTexture(GL_TEXTURE0 + i);
     glBindTexture(GL_TEXTURE_2D, 0);
     glBindTexture(GL_TEXTURE_2D_ARRAY, 0);
     glBindTexture(GL_TEXTURE_1D, 0);
     glBindTexture(GL_TEXTURE_1D_ARRAY, 0);
@jaywalker can you test this simple patch? ``` @@ -1770,16 +1770,16 @@ void GPU_texture_unbind(GPUTexture *tex) } void GPU_texture_unbind_all(void) { if (GLEW_ARB_multi_bind) { - glBindTextures(0, GPU_max_textures(), NULL); - glBindSamplers(0, GPU_max_textures(), NULL); + glBindTextures(0, GPU_max_textures_frag(), NULL); + glBindSamplers(0, GPU_max_textures_frag(), NULL); return; } - for (int i = 0; i < GPU_max_textures(); i++) { + for (int i = 0; i < GPU_max_textures_frag(); i++) { glActiveTexture(GL_TEXTURE0 + i); glBindTexture(GL_TEXTURE_2D, 0); glBindTexture(GL_TEXTURE_2D_ARRAY, 0); glBindTexture(GL_TEXTURE_1D, 0); glBindTexture(GL_TEXTURE_1D_ARRAY, 0); ```

Sorry for the delay Clément, I've just got in from work, but I've had my coffee and rebuilt with your changes. The result, sadly, is little different. The crash file is attached.blender.crash.txt.

Sorry for the delay Clément, I've just got in from work, but I've had my coffee and rebuilt with your changes. The result, sadly, is little different. The crash file is attached.[blender.crash.txt](https://archive.blender.org/developer/F8602390/blender.crash.txt).

Clément, I was intrigued by a brief report from my brother who had tried out various Blender versions I installed on his PC on Sunday. He reported no issues with any version of Blender, specifically 2.81a, 2.82a, 2.83 LTS and Thursday's and Friday's dailies for 2.90 alpha (my "good" and "bad" examples respectively).

His hardware is similar to mine, though with an older generation AMD A10 APU (it might even be A8) and no CUDA render acceleration.

I also had a closer look at the change you asked me to try, and that took me to the GPUGlobal struct in gpu_extensions.c. Reading through that I was reminded of an OpenGL problem I had when trying to run Blender "remote" using VirtualGL. The VGL author explained that not all graphics drivers are completely compliant with OpenGL standards. AMD GPUs were (and still are) problematic when used with VGL and he has had to find many workarounds for shortcomings in the drivers.

The point is, seeing the GPUGlobal struct, it made me wonder how, and where the code detected which screen driver is in use and thus which dodges and workarounds to apply. My brother's installation poses no problems to Blender's screen driver detection as he has only one GPU device (the integrated A10/A8). Mine has both the AMD APU and a headless Nvidia GTX1050 Ti (or GTX 960 on the other test machine). I thought that using the factory-default startup for the tests with 2.90 was enough to "hide" the Nvidia card, but brother Stephen's results prompted me to remove the 1050 altogether and retry with Friday, Saturday, and now Sunday daily builds.

Success.

All the problems went away, except for one. All "problematic" 2.90 versions run perfectly well and as I would expect. The one remaining problem, of course, is that I can no longer use my Nvidia card to speed up Blender renders.

Steve has another PC with an A8 and a GTX 760. I can get him to transfer the problem blender installation to that machine and confirm my results. He may also be able, on that PC, to test the Windows 2.90 to see if the headless Nvidia card is an issue on Windows too. I will hold off on that as he is 90 miles away and I would have to match up some free time with him to talk him through the process.

Clément, I was intrigued by a brief report from my brother who had tried out various Blender versions I installed on his PC on Sunday. He reported no issues with any version of Blender, specifically 2.81a, 2.82a, 2.83 LTS and Thursday's and Friday's dailies for 2.90 alpha (my "good" and "bad" examples respectively). His hardware is similar to mine, though with an older generation AMD A10 APU (it might even be A8) and no CUDA render acceleration. I also had a closer look at the change you asked me to try, and that took me to the GPUGlobal struct in gpu_extensions.c. Reading through that I was reminded of an OpenGL problem I had when trying to run Blender "remote" using VirtualGL. The VGL author explained that not all graphics drivers are completely compliant with OpenGL standards. AMD GPUs were (and still are) problematic when used with VGL and he has had to find many workarounds for shortcomings in the drivers. The point is, seeing the GPUGlobal struct, it made me wonder how, and where the code detected which screen driver is in use and thus which dodges and workarounds to apply. My brother's installation poses no problems to Blender's screen driver detection as he has only one GPU device (the integrated A10/A8). Mine has both the AMD APU and a headless Nvidia GTX1050 Ti (or GTX 960 on the other test machine). I thought that using the factory-default startup for the tests with 2.90 was enough to "hide" the Nvidia card, but brother Stephen's results prompted me to remove the 1050 altogether and retry with Friday, Saturday, and now Sunday daily builds. Success. All the problems went away, except for one. All "problematic" 2.90 versions run perfectly well and as I would expect. The one remaining problem, of course, is that I can no longer use my Nvidia card to speed up Blender renders. Steve has another PC with an A8 and a GTX 760. I can get him to transfer the problem blender installation to that machine and confirm my results. He may also be able, on that PC, to test the Windows 2.90 to see if the headless Nvidia card is an issue on Windows too. I will hold off on that as he is 90 miles away and I would have to match up some free time with him to talk him through the process.

Added subscriber: @Arken

Added subscriber: @Arken

I've also had Blender 2.9 crash on open recently. I have to open 2-3 times for it to stay open.

I've also had Blender 2.9 crash on open recently. I have to open 2-3 times for it to stay open.

Update. The success in running 2.90 dailies (after 4th June) which resulted from removing the Nvidia GTX1050 was sustained on replacing it and using it in Blender for CUDA operations.

For comparison I tried the same approach on the secondary PC with the GTX960. First I tried removing the kernel module (rmmod), then I tried removing the module from the initrd, and finally I removed the card from the machine. None of these operations enabled that PC to run the problematic dailies, or the local builds made on that machine.

For no particular reason I connected to the secondary PC with ssh and executed yesterday evening's local build. It ran on the forwarded X11 display on my primary machine. I was surprised. I have since replaced the GTX960 in the secondary machine and can now use current 2,90 alpha builds, with CUDA acceleration, on the secondary machine if its X display is forwarded to another machine. It is a little sluggish, but it works.

Tonight I updated the Blender sources on the secondary machine and rebuilt Blender with debug info to see if gdb would tell me anything useful. First it told me I needed a boat-load of debug packages to be installed. These are all installed and verified as present, but still gdb complains. The debug run up to the crash, and a backtrace can be found in the attached file. It doesn't mean much to me, but is there a way I can improve the usefuleness of the result?

blender-2.90-gdb.txt

Update. The success in running 2.90 dailies (after 4th June) which resulted from removing the Nvidia GTX1050 was sustained on replacing it and using it in Blender for CUDA operations. For comparison I tried the same approach on the secondary PC with the GTX960. First I tried removing the kernel module (rmmod), then I tried removing the module from the initrd, and finally I removed the card from the machine. None of these operations enabled that PC to run the problematic dailies, or the local builds made on that machine. For no particular reason I connected to the secondary PC with ssh and executed yesterday evening's local build. It ran on the forwarded X11 display on my primary machine. I was surprised. I have since replaced the GTX960 in the secondary machine and can now use current 2,90 alpha builds, with CUDA acceleration, on the secondary machine if its X display is forwarded to another machine. It is a little sluggish, but it works. Tonight I updated the Blender sources on the secondary machine and rebuilt Blender with debug info to see if gdb would tell me anything useful. First it told me I needed a boat-load of debug packages to be installed. These are all installed and verified as present, but still gdb complains. The debug run up to the crash, and a backtrace can be found in the attached file. It doesn't mean much to me, but is there a way I can improve the usefuleness of the result? [blender-2.90-gdb.txt](https://archive.blender.org/developer/F8606670/blender-2.90-gdb.txt)

@jaywalker What about this patch P1460 ?

@jaywalker What about this patch [P1460](https://archive.blender.org/developer/P1460.txt) ?

Wow! That fixed it! I updated my local sources and applied your patch. It reached the splash screen and I quit the program. I restored my ~/.config/blender/2.90 directory and tried again. It starts correctly and loads the blend file I am working on. Finally I initialised the nvidia kernel module and re-started Blender. CUDA is working for Cycles, and all 3D viewport modes look fine. I think that is a result!

Do we now understand what the problem is? I find it hard to blame Blender when two essentially identical PCs will crash and not crash with your original code.

Wow! That fixed it! I updated my local sources and applied your patch. It reached the splash screen and I quit the program. I restored my ~/.config/blender/2.90 directory and tried again. It starts correctly and loads the blend file I am working on. Finally I initialised the nvidia kernel module and re-started Blender. CUDA is working for Cycles, and all 3D viewport modes look fine. I think that is a result! Do we now understand what the problem is? I find it hard to blame Blender when two essentially identical PCs will crash and not crash with your original code.

Just for completeness, I re-ran my local build without your patch, and having dealt with the missing debuginfo warnings I got this:

Thread 1 "blender" received signal SIGSEGV, Segmentation fault.
__memcpy_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:104
104 ../sysdeps/x86_64/multiarch/memcpy-ssse3.S: No such file or directory.
(gdb) bt

  • 0 __memcpy_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:104
  • 1 0x000000000565bc8f in GPU_texture_unbind_all ()
  • 2 0x00000000010839b3 in DRW_state_reset ()
  • 3 0x000000000107b76e in DRW_draw_render_loop_ex ()
  • 4 0x0000000001691e0f in view3d_main_region_draw ()
  • 5 0x00000000012a1571 in ED_region_do_draw ()
  • 6 0x0000000000f414e2 in wm_draw_update ()
  • 7 0x0000000000f3f490 in WM_main ()
  • 8 0x0000000000be6e38 in main ()
Just for completeness, I re-ran my local build without your patch, and having dealt with the missing debuginfo warnings I got this: Thread 1 "blender" received signal SIGSEGV, Segmentation fault. __memcpy_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:104 104 ../sysdeps/x86_64/multiarch/memcpy-ssse3.S: No such file or directory. (gdb) bt - 0 __memcpy_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:104 - 1 0x000000000565bc8f in GPU_texture_unbind_all () - 2 0x00000000010839b3 in DRW_state_reset () - 3 0x000000000107b76e in DRW_draw_render_loop_ex () - 4 0x0000000001691e0f in view3d_main_region_draw () - 5 0x00000000012a1571 in ED_region_do_draw () - 6 0x0000000000f414e2 in wm_draw_update () - 7 0x0000000000f3f490 in WM_main () - 8 0x0000000000be6e38 in main ()

This issue was referenced by df8847de6d

This issue was referenced by df8847de6daaecafe6b9b090ca061c8252691bef

Changed status from 'Confirmed' to: 'Resolved'

Changed status from 'Confirmed' to: 'Resolved'
Clément Foucault self-assigned this 2020-06-28 01:44:32 +02:00

Thank you Clément for the fix. I have just updated and re-built my local copy and Blender is now starting normally on both machines.

I still don't understand why two essentially identical systems which are expected to use the same kernels, drivers, libraries and so on, should have behaved differently. I may have to consult with Mageia devs to see if they have any ideas on that.

Richard

Thank you Clément for the fix. I have just updated and re-built my local copy and Blender is now starting normally on both machines. I still don't understand why two essentially identical systems which are expected to use the same kernels, drivers, libraries and so on, should have behaved differently. I may have to consult with Mageia devs to see if they have any ideas on that. Richard
Thomas Dinges added this to the 2.90 milestone 2023-02-08 16:27:16 +01:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
6 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#77549
No description provided.