Blender lockup/stops responding on Pop!_OS 22.04 #112880

Open
opened 2023-09-26 00:01:42 +02:00 by R-Davidson · 6 comments

System Information
Operating system: Linux-6.4.6-76060406-generic-x86_64-with-glibc2.35 64 Bits, X11 UI
Graphics cards:

  • Mesa Intel(R) HD Graphics 630 (KBL GT2) Intel 4.6 (Core Profile) Mesa 23.1.3-1pop0~1689084530~22.04~0618746 (the graphic card output from the Help > Report a Bug... this worries me because I have two GPUs that are selected in the "Cycles Render Devices").
  • GeForce GTX 1080
  • GeForce GTX Titan X

Blender Version
Broken: version: 3.6.1, branch: blender-v3.6-release, commit date: 2023-07-17 12:50, hash: 8bda729ef4dc
Worked: 3.6.x back in mid-August. I don't recall if I had updated blender since then or not. Maybe this is an OS issue?
Only non-default add-on being used is Molecular Nodes v2.7.4.

Short description of error
After a seemingly random amount of time, the whole Blender interface completely locks up. toping the active programs, Blender keeps using one thread until the program is force killed via the command line.

Exact steps for others to reproduce the error
Whether running the default startup or loading a previously working .blend file, both processes result in Blender locking up after an indeterminate amount of time. There do not seem to be exact steps to recreate the bug. I have had the lockup happen while I was messing with the geometry node map or moving objects in the viewer. There is no lagging happening; just a single lockup event that persisted for ~30 minutes (I ended the program at this point).

**System Information** Operating system: Linux-6.4.6-76060406-generic-x86_64-with-glibc2.35 64 Bits, X11 UI Graphics cards: - Mesa Intel(R) HD Graphics 630 (KBL GT2) Intel 4.6 (Core Profile) Mesa 23.1.3-1pop0~1689084530~22.04~0618746 (the graphic card output from the Help > Report a Bug... this worries me because I have two GPUs that are selected in the "Cycles Render Devices"). - GeForce GTX 1080 - GeForce GTX Titan X **Blender Version** Broken: version: 3.6.1, branch: blender-v3.6-release, commit date: 2023-07-17 12:50, hash: `8bda729ef4dc` Worked: 3.6.x back in mid-August. I don't recall if I had updated blender since then or not. Maybe this is an OS issue? Only non-default add-on being used is Molecular Nodes v2.7.4. **Short description of error** After a seemingly random amount of time, the whole Blender interface completely locks up. `top`ing the active programs, Blender keeps using one thread until the program is force killed via the command line. **Exact steps for others to reproduce the error** Whether running the default startup or loading a previously working .blend file, both processes result in Blender locking up after an indeterminate amount of time. There do not seem to be exact steps to recreate the bug. I have had the lockup happen while I was messing with the geometry node map or moving objects in the viewer. There is no lagging happening; just a single lockup event that persisted for ~30 minutes (I ended the program at this point).
R-Davidson added the
Type
Report
Priority
Normal
Status
Needs Triage
labels 2023-09-26 00:01:42 +02:00
Author

I posted to /r/blender and /r/pop_os to see if I got any insightful responses there. (https://www.reddit.com/r/blender/comments/16s4h98/blender_issues_on_pop_os/)

A commenter suggested that I recreate the error and check for any errors/logging thrown to journalctl via journalctl | grep -i blender. Here's the output associated with a single lockup event:

Sep 25 21:00:16 comput kernel: i915 0000:00:02.0: [drm] blender[12045] context reset due to GPU hang Sep 25 21:00:16 comput kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85df9e9f, in blender [12045]

I'm not sure what these messages mean but maybe they are helpful for you all!

I posted to /r/blender and /r/pop_os to see if I got any insightful responses there. (https://www.reddit.com/r/blender/comments/16s4h98/blender_issues_on_pop_os/) A commenter suggested that I recreate the error and check for any errors/logging thrown to journalctl via `journalctl | grep -i blender`. Here's the output associated with a single lockup event: `Sep 25 21:00:16 comput kernel: i915 0000:00:02.0: [drm] blender[12045] context reset due to GPU hang Sep 25 21:00:16 comput kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85df9e9f, in blender [12045]` I'm not sure what these messages mean but maybe they are helpful for you all!
Member

UHD620/630 is known to have this GPU HANG error on linux indeed.

Looks like you have hybrid graphics, you could either prime-select nvidia or try to see if you could disable integrated graphics in your bios completely. Not sure how a machine with two different nvidia cards are handled, you probably need to look up some linux nvidia documentation to configure them.

UHD620/630 is known to have this `GPU HANG` error on linux indeed. Looks like you have hybrid graphics, you could either `prime-select nvidia` or try to see if you could disable integrated graphics in your bios completely. Not sure how a machine with two different nvidia cards are handled, you probably need to look up some linux nvidia documentation to configure them.
Author

@ChengduLittleA Thanks for the quick response!

So, the logic behind the three different graphics cards is that the two NVIDIA cards are often used to run molecular dynamic simulations for scientific research. Having the integrated graphics was intended to prevent those NVIDIA cards from having to manage simulations and day-to-day usage of the computer. I haven't had any issues with this setup for the original purpose. For these simulations, I just use CUDA_VISIBLE_DEVICES environment variable to hard-set which GPU is used for the given task. Then, while the simulations run, I can still do analyses/read papers without affecting computational efficiency too much.

From what I've read, prime-select is a NVIDIA-provided command to control the visible GPUs that the whole computer can utilize, whether the cards are Intel or NVIDIA. I'd rather just tell Blender to only use one or both of my NVIDIA cards and let the rest of the computer still run on the integrated GPU. Is that possible with prime-select? Alternatively, I am using terminal to boot up blender so is there a prime-select (or equivalent) command that only affects the single terminal instance?

Once again, I greatly appreciate your help!

@ChengduLittleA Thanks for the quick response! So, the logic behind the three different graphics cards is that the two NVIDIA cards are often used to run molecular dynamic simulations for scientific research. Having the integrated graphics was intended to prevent those NVIDIA cards from having to manage simulations _and_ day-to-day usage of the computer. I haven't had any issues with this setup for the original purpose. For these simulations, I just use CUDA_VISIBLE_DEVICES environment variable to hard-set which GPU is used for the given task. Then, while the simulations run, I can still do analyses/read papers without affecting computational efficiency too much. From what I've read, `prime-select` is a NVIDIA-provided command to control the visible GPUs that the whole computer can utilize, whether the cards are Intel or NVIDIA. I'd rather just tell Blender to only use one or both of my NVIDIA cards and let the rest of the computer still run on the integrated GPU. Is that possible with `prime-select`? Alternatively, I am using terminal to boot up blender so is there a `prime-select` (or equivalent) command that only affects the single terminal instance? Once again, I greatly appreciate your help!
Member

@R-Davidson I see, so you prefer to keep this setup running mostly on integrated graphics, and only use blender and other scientific calculation programs on your two GPUs. I'm actually not sure how to do that in linux since in my computer I just disabled integrated graphics in BIOS and then blender will naturally start on nvidia card. Typically blender should start with the card that your monitor is plugged into, so if you plug your monitor into the nvidia card instead of on the motherboard, it should work out of box.

There's an option to use DRI_PRIME to force specific program to run on nvidia cards, try this solution to see if it helps maybe?

@R-Davidson I see, so you prefer to keep this setup running mostly on integrated graphics, and only use blender and other scientific calculation programs on your two GPUs. I'm actually not sure how to do that in linux since in my computer I just disabled integrated graphics in BIOS and then blender will naturally start on nvidia card. Typically blender should start with the card that your monitor is plugged into, so if you plug your monitor into the nvidia card instead of on the motherboard, it should work out of box. There's an option to use `DRI_PRIME` to force specific program to run on nvidia cards, try [this solution](https://askubuntu.com/a/1306140) to see if it helps maybe?
YimingWu added
Status
Needs Information from User
and removed
Status
Needs Triage
labels 2023-09-27 04:35:38 +02:00
Author

Sorry for the week-long delay in response.

Based on your comment,

Typically blender should start with the card that your monitor is plugged into, so if you plug your monitor into the nvidia card instead of on the motherboard, it should work out of box.

I started to test this out. I had to pare down my setup from multiple monitors and a video capture card to a single monitor. Since there does not seem to be a single command to get status updates from all GPUs (integrated Intel and NVIDIA cards), I setup a run of sudo intel_gpu_top and watch -n 1 nvidia-smi to watch what hardware is tasked with the Blender run. I've attached three videos of me working with this setup where the monitor's hdmi cable is switched from a motherboard hdmi, 1080, and Titan X HDMI ports. These videos are me running the blender install described above while I just mess around in the visualization window, each ending with Blender experiencing the lockup. Note that no matter the GPU hardware running the monitor, the integrated GPU is always seen to be active with ranging max clock speeds and utilization.

I'm not super familiar with the nitty-gritty of these hardware elements and how tasks are handed off to which hardware, when not specified by the user. But maybe these videos are useful to highlight the issue being experienced.

Sorry for the week-long delay in response. Based on your comment, > Typically blender should start with the card that your monitor is plugged into, so if you plug your monitor into the nvidia card instead of on the motherboard, it should work out of box. I started to test this out. I had to pare down my setup from multiple monitors and a video capture card to a single monitor. Since there does not seem to be a single command to get status updates from all GPUs (integrated Intel and NVIDIA cards), I setup a run of `sudo intel_gpu_top` and `watch -n 1 nvidia-smi` to watch what hardware is tasked with the Blender run. I've attached three videos of me working with this setup where the monitor's hdmi cable is switched from a motherboard hdmi, 1080, and Titan X HDMI ports. These videos are me running the blender install described above while I just mess around in the visualization window, each ending with Blender experiencing the lockup. Note that no matter the GPU hardware running the monitor, the integrated GPU is always seen to be active with ranging max clock speeds and utilization. I'm not super familiar with the nitty-gritty of these hardware elements and how tasks are handed off to which hardware, when not specified by the user. But maybe these videos are useful to highlight the issue being experienced.
Author

I've continued to test other setups. Comments on my reddit post about this issue (https://www.reddit.com/r/blender/comments/16s4h98/blender_issues_on_pop_os/; cross-posted to https://lemmy.ca/post/6044731 by the commenter) suggest that I change the maximum frequency of the integrated intel GPU to overcome the lockup issue based on insights posted at https://gitlab.freedesktop.org/drm/intel/-/issues/6154 and https://gitlab.freedesktop.org/drm/intel/-/issues/4858.

To do so, I used the sudo intel_gpu_frequency -m command to lock the frequency to the max value. Attached is the video of this setup resulting in a lockup.

I've continued to test other setups. Comments on my reddit post about this issue (https://www.reddit.com/r/blender/comments/16s4h98/blender_issues_on_pop_os/; cross-posted to https://lemmy.ca/post/6044731 by the commenter) suggest that I change the maximum frequency of the integrated intel GPU to overcome the lockup issue based on insights posted at https://gitlab.freedesktop.org/drm/intel/-/issues/6154 and https://gitlab.freedesktop.org/drm/intel/-/issues/4858. To do so, I used the `sudo intel_gpu_frequency -m` command to lock the frequency to the max value. Attached is the video of this setup resulting in a lockup.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#112880
No description provided.