Cycles HIP rendered viewport crashes system/GPU on Linux with RDNA2 GPU #100353
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
25 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#100353
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
System Information
Operating system: Fedora Linux Release 36 - Linux-5.18.16-200.fc36.x86_64-x86_64-with-glibc2.35 64 Bits
Graphics card: AMD Radeon RX 6900 XT (sienna_cichlid, LLVM 14.0.0, DRM 3.46, 5.18.16-200.fc36.x86_64) AMD 4.6 (Core Profile) Mesa 22.1.5
ROCm Version: rocm-5.2.0
Blender Version
Broken: version: 3.2.2, branch: master, commit date: 2022-08-02 18:15, hash:
bcfdb14560
Worked: never
Short description of error
When using HIP on AMD GPU together with Cycles GPU Rendering my entire system freezes when using the rendered viewport, and I have to hard-reset the PC. I think it's the GPU /GPU-driver that crashes.
Kernel messages from journalctl suggest amdgpu (kernel module?) crashing (see attached) journalctl.txt
The crash is not always immediate and might instead only happen after using the rendered viewport for a while or switching back and forth between textured and rendered a few times.
Rendering with F12 works fine and I have observed no other crashes as long as I don't use the rendered viewport.
I have only started using blender a few days ago creating procedural textures in the node editor and using the rendered viewport all the time and the crash didn't happen once.
Now that I started working with image-texture-based assets like in the attached .blend file this problem started happening.
Maybe it's related with this issue: https://developer.blender.org/T97591
Can I somehow get more detailed debug output from blender? The "-d" switch didn't generate much more relevant output.
Exact steps for others to reproduce the error
I could reliably reproduce the issue by following these steps test.blend
--
let me know what info I can provide to help.
Thank you!
#103316 was marked as duplicate of this issue
#102925 was marked as duplicate of this issue
#102267 was marked as duplicate of this issue
#101780 was marked as duplicate of this issue
Issue you're experiencing is very likely same as #97591 (Cycles HIP error with image textures on Linux and RDNA1)
That issue is not known to happen with RDNA2 cards.
But it would be good to verify if happens under the same conditions, with image textures whose resolution is not a multiple of 128. The attached .blend does not contain the image textures so we can't tell.
I assume blender doesn't "crash" when my system locks up, because I cant find a /tmp/blender-crash.txt File.
I started it with the parameters listed in the manual (thanks for the links!)
blender --factory-startup --debug-all
and attached logfiles for both blender sessions (blender1.log and blender2.log)
I hope this attached blend file contains the textures now - (I used the "Automatically Pack Resources" button)
blender2.log
blender1.log
test_v2.blend
Thank you for your help.
Thanks. The file contains some images with a resolution that is not a multiple of 128, but those are not used by any shader. So I think this is likely a different issue than #97591.
There is a bug with a similar backtrace that may affect your kernel version 5.18.16.
https://gitlab.freedesktop.org/drm/amd/-/issues/2050
https://bugzilla.kernel.org/show_bug.cgi?id=216173
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1016548
I don't know if it's the same issue, but sounds similar.
Attempting to render test.blend by pressing F12 with "GPU Compute" enabled resulted in Blender crashing on my system.
System Information
Operating system: Fedora Linux Release 36 - Linux-5.18.16-200.fc36.x86_64-x86_64-with-glibc2.35 64 Bits
Graphics card: AMD Radeon RX 5700 XT (navi10, LLVM 14.0.0, DRM 3.46, 5.18.16-200.fc36.x86_64) AMD 4.6 (Core Profile) Mesa 22.1.6
ROCm Version: rocm-5.2.0
Blender version
version: 3.2.1, branch: unknown, commit date: 1970-01-01 00:00, hash: unknown, type: Release (installed from the Fedora 36 updates-testing repository)
The crash log looks a lot like #97591.
test.crash.txt
@flavonol, that's almost certainly the same bug as #97591 and different than this report.
I looked a bit more into the kernel bug with similar backtrace, and it's not exactly the same issue. That's a bug introduced in a release candidate of 5.19 that was fixed in the final release. So it wouldn't affect 5.18.16. Still this area seems to be under active development so maybe 5.19 or newer kernel versions have a fix.
In any case, if the system freezes that's a bug in the kernel. Maybe there is also a bug in Blender that triggers it, but it doesn't seem all that likely to me.
CC @BrianSavery @Sayak-Biswas.
I guess the next step would be to test with a newer kernel version, or get some input from AMD.
I am unable to reproduce this issue on an Ubuntu 20.04.4, it is running kernel 5.15 though. I will upgrade to 5.19.1 via a ppa and see if it repros.
Okay, with 5.19 kernel I am able to kind of reproduce the issue. It doesn't cause a system freeze for me, but both instances of blender freeze.
I don't have access to the 5.19 Kernel right now but I tried with the 6.0-rc1 from the fedora-rawhide repository and it's still the same.
I think I'm going to open a bug report on bugzilla.redhat.com for the system crash (I hope that's the correct place).
same problem with rx6800xt
any versions of blender
arch linux 5.19.3
hip version 22.20.3.50203-1 (prevision version same error)
problem also only with viewport rendering specially after enable wireframe
after "freeze" on second tty can see multiple times repeated message "failed to initialize parser -125!"
also was trying with same software ( version arch os/blender/hip ets) configuration but on laptop with 6700m and all works perfectly
also to be sure problem is not in video card
i was tested the rx6800xt with windows 11 os and there was no problem
Thanks for the additional info!
It's Interesting that your 6700m laptop doesn't have this issue - maybe mobile and desktop GPU's are somehow treated differently !?
Does your Laptop maybe have a different CPU than your Desktop (Intel/AMD?)
My CPU and GPU are both AMD: RX 6900XT with Ryzen 7 5800x
I created a bug report on RedHat Bugzilla for this issue:
https://bugzilla.redhat.com/show_bug.cgi?id=2119986
cpu and gpu on both pc's amd
laptop ryzen 7 5800h rx6700m
desktop ryzen 9 3950x rx6800xt
if i'm right 6600-6700xt and 6800-6900xt have technologically different gpu chips
go i guess part of hip driver maybe different too
One user on the fedora forum mentioned he has the same issue on a 6700XT:
https://discussion.fedoraproject.org/t/blender-amd-and-opencl/40199/12?u=joni999
well i no idea then ...
same system/kernel/software, even all settings was just moved on laptop by copy and mount home folder
also tryed today kernel linux-lts-5.15.63-1 on desktop and this not change situation, same problem still happened
but i found solution (rule) to avoid freezes. you should have only ONE space with 3d VIEWPORT mine look like this for avoid should look like this
and second rule is the OVERLAYS should be OFF then frezes not happens to me tested on 5.15 kernel and on 5.19 too same result
If multiple 3D viewports or overlays are the problem, I guess there is some issue with the GPU driver handling HIP and OpenGL work simultaneously. Or some kind of race conditions or other problem while the driver is under heavy load. If it was an issue in the Cycles kernel code or HIP compiler that leads to the kernel e.g. going into an infinite loop, it would most likely hang also when rendering just with Cycles.
I believe the most relevant bug tracker is this, where a some similar sounding hangs/freezes were reported.
https://gitlab.freedesktop.org/drm/amd/-/issues
For example this:
https://gitlab.freedesktop.org/drm/amd/-/issues/2083
Thanks for the link to the bug tracker - I created a ticket:
https://gitlab.freedesktop.org/drm/amd/-/issues/2145
Thank you viktor for the workaround - I'll try that out (although losing the overlays is a high price in my opinion)
Update:
I'm pretty sure it's an issue with the amd driver setup.
On the drm/amd bugtracker I was asked to reproduce the issue on RedHat Enterprise Linux 8.6 because fedora is not officially supported.
So I did install RHEL and installed the amdgpu-install package via the official RPM
Then running
> sudo amdgpu-install --usecase=hip
worked without issues and I could not reproduce the error on RHEL 8.6 - it works like it's supposed to.
So I tried the same thing on Fedora 36 but the package amdgpu-dkms which seems to be a kernel module that will be built against the current kernel, does not build/install and instead throws errors leaving me with functional ROCm but I still experience the issue.
I'm trying to get the kernel module to compile on Fedora but it seems there are some checks for certain kernel settings which I honestly have little clue of.
here's the build error:
TL;DR: It's not a blender issue. I'll close this ticket tomorrow and continue on here:
https://discussion.fedoraproject.org/t/blender-amd-and-opencl/40199
Thanks to everyone!
Changed status from 'Needs User Info' to: 'Archived'
This comment was removed by @viktor3d
If you are experiencing the same issue and you are on one of the ROCm supported Distros please share your experience at this amd ticket:
https://gitlab.freedesktop.org/drm/amd/-/issues/2145
I hope this can be fixed soon, so Cycles will be usable again!
Changed status from 'Archived' to: 'Needs Triage'
Think we should keep this report open until the crash is fixed
Changed status from 'Needs Triage' to: 'Confirmed'
Any chance of an update of this issue? AMD haven't commented since August, and the drm/amd issue has not had developer feedback in a while.
I got my hands on an RX7900XTX and guess what... same issue on Ubuntu 22.04.1
Has anyone tested this? It can't be. If it really was - what distro and driver version was used?!
Same issue on blender 3.4.0 ubuntu 22.04.1 rocm 5.4.1 on a 6800XT installed via amdgpu-install --usecase=hip. To get the bug I just open blender, subdivide the default cube to 64, and go to viewport shading and rotate the cube.
But there is a difference with my previous tries on ubuntu 20.04.5. On that former distro I had a complete system freeze. Now on ubuntu 22.04.1 with rocm 5.4.1 I have a blender freeze for a few seconds then directly thrown to ubuntu login screen and I can login again. So obviously something changed. I've seen (but I'm really not a specialist) that blender 3.4.0 haad some changes regarding RDNA1 and previous. Could those change have an impact to the bug encountered with RDNA2 ?
@jean-martin-barbut I tried with the same setup like you mentioned. But I didn't get a freeze. Is this happening with only one blender instance or are you running multiple instances?
@Sayak-Biswas No, only one instance of blender opend, just the default new file with the cube subdivided by 64 (I even think that 50 is enough). Subdivided, not subdiv modifier.
In order to have the crash, in viewport overlay, "outline selected" must be on. If it is off, no crash.
YYou have many testimonies of this bug on this thread :
https://gitlab.freedesktop.org/drm/amd/-/issues/2145
We are able to reproduce this and debugging this issue currently, will post updates here.
at last some good news :)
Any news ? Is the issue fixed now ? It is already 6 monthes...
has any progress been made
it seems to have something to do with the gpu not being able to handle all the commands being sent to it
https://gitlab.freedesktop.org/mesa/drm/-/blob/main/amdgpu/amdgpu_cs.c#L467
I'm told this is fixed in ROCm 5.5 on the driver side. It may take a while for Linux distributions to upgrade their packages. If anyone is able to confirm if it's fixed please let usknow, since I don't have this issue on the AMD GPU I use for testing.
no hard tests but for 30min work with ROCm 5.5, no problems. No crash, no lags, seems problem is fixed.
ROCm 5.5
Mesa 23.0.3
Blender 3.5
Radeon RX 6800 XT
I installed it. And it is really strange. If I open a demo file like the classroom that has around 150000 faces no crash (I did a 5 min test), but if I take a simple default cube and subdivide it 50 times ( around 15000 faces, 10x less that the classroom) the it crashes again.
Really weird ! And don't understand why.
@raygam could you make this test : create a new blank .blend file, add a default cube, subdivide by 50 (right click in edit mode, subdivide 50) and then go to the viewport shading / cycles / gpu and try to rotate (MMB) a few times ? Doest it crash or not on your system ?
classroom has 150k with all objects, You said one object with 15k faces that two different things. Yes can confirm one object with probably 10k+ can crash it. Got crash divided box at 12k faces.
So problem is with one object with +10k faces not entire scene with object summary 10k faces.
anyway it's not very often to use single object with that faces ratio, at least for me :)
The thing is that I can run mac, windows and Linux on the same computer (so exactly the same hardware but not the same drivers) and I don't have this issue on windows and mac. Did you have the crash before in viewport shading with files like the classroom ? Having 15k+ faces on a single object is not so rare.
Yes, had crash problem, actually Rocm before 5.5 was just useless, had cash even at 2 faces or just mouse move.
At this moment had no problems with rocm 5.5 with millions faces but not single object.
so, Having same issue as You with 10k+faces single object.
So crash generally seems to be fixed, but new bug comes with high polygon objects, so 5.6 is needed.
additionally big MPixels textures or environment background still not working.
I still get crashes with two view-ports rendering the default cube in cycles at the same time. Do I need a new HIP Cycles kernels or something?
blender: 3.5.1-2 (packaged by ARCH) or
blender: 3.5.1 (from https://www.blender.org/download/)
rocm: 5.5.0-1 (opencl-amd on AUR)
Mesa: 23.0.3-1
GPU: RX6800
@raygam Did you try 2 view-ports rendering cycles at the same time?
I think that it depends on many things. On previous amdgpu-install versions I didn't have the issues you have, but I had the high poly issue... So for me there is no change.
ubunutu 22.04.2
Ive tested right now and it crash
hope devs reading that, but ill report on rocm anyway if not yet reported
We've tested with 3.6. Still see the issue on 3.5, but should be fixed in 3.6.
@brecht have we enabled 3.6 builds with the new rocm 5.5 compiler?
HIP was re-enabled in 3.6 Linux builds just now, with the ROCm 5.5 compiler. It's included in the latest build:
https://builder.blender.org/download/daily/
If it still crashes, is it the same crash as before? A driver/system crash, or just Blender crashing? And if so is there a backtrace?
Yes it is different...
For me its the same... 3.6 daily still appears to crash in the same manner as 3.5 did. That is with two view-ports rendering the default cube in cycles at the same time it crashed Sway back to TTL.
Is it possible to confirm that the kernels in
'/blender-3.6.0-alpha+main.d1bfda16bb90-linux.x86_64-release/3.6/scripts/addons/cycles/lib'
folder are being used and not something else?Journalctl logs:
blender: blender-3.6.0-alpha+main.d1bfda16bb90-linux.x86_64-release
rocm: 5.5.0-1 (opencl-amd on AUR)
Mesa: 23.0.3-1
GPU: RX6800
@jean-martin-barbut I think I got that same error at first.... but all I did was reopen blender and it worked (rendered and crashed)...
@BrianSavery I guess the conclusion is that this was not fully fixed, would be good if someone from AMD could confirm if the issue still happens. I guess testing with two viewports open might be an easy repro case.
OS: Opensuse Tumbleweed.
GPU: RX6600
CPU: AMD Ryzen 3500x (without built-in gpu)
I have this issue and had it on the Nobara Linux too. In Opensuse wiki there is written this:
https://en.opensuse.org/AMD_OpenCL#AMD_ROCm
How do I "dedicate one GPU (the simplest one, such as built-in) for desktop usage" ? I have HIP installed from AMD ROCm repo. Please if you know provide me with exact commands
We are speaking of HIP, not openCL.
OS: Nobara Linux 37 (Thirty Seven)
Kernel: 6.3.7-200.fsync.fc37.x86
GPU: AMD RX 6600 XT
CPU: AMD Ryzen 7 5700G
ROCM Version: 5.4.3
I don't know if its related or not but. When I tried to replicate the issue using Cycles with the HIP interface, it causes my system to crash. However, I don't encounter this issue when using Radeon ProRender with HIP.
As far as i know this problem is not related to kernel, i mean fix is not in kernel just ROCm and Blender it self and related only to Cycles. ProRenderer is different thing.
At this moment on my end fix makes viewport more stable (before its not work at all, crash at once). Still experiencing crashes same type but it more rare as before. No difference between Blender 3.5 and 3.6, same crash from time to time.
Still rapid crash when:
Still no high resolution texture solution. You can load only 8k max texture, 8k+ cannot be used. Should be temporary down scale or something as solution.
To devs
is there any info/plans for next fixes? We have a small step ahead, ROCm+Blender started work but still very unstable.
@BrianSavery @Sayak-Biswas any update on this?
Blender shadow caustic = fatality at once
it's not surprise if people switching to intel and nvidia. Thinking about this but sell card + buy new one = money lose. It's because AMD can't repair bug over an year or even two
So HIP does not work? Why switch from openCL anyway? HIP should be as experimental mode. Then we can work and test new HIP feature
I don't have an ETA on the fix, but I am pushing to have it fixed before 4.0.
AMD cards may be better value for money but this bug made me sell my AMD card
why doesnt blender just use vulkan
its faster and less buggy (and its not opencl so it has that going for it)
It took me a long time to write what I want to say. Blender 3.0 came out on December 3, 2021, and that's when AMD put together new support for its graphics cards (HIP). Since then we have had 6 versions of Blender. Before the latest release came out, I wrote about my suspicions that AMD would not make it in time with HIP RT on Linux, but I hope that at least the viewport will finally be usable enough to work without worrying about crashing the whole system. As for the positives, in the new Blender 3.6 the rendering time of the "BMW" project has decreased from 43 to 37 seconds, and that's a nice change. At first it also seemed that the viewport was finally stable but unfortunately I was wrong. Previously (Blender 3.5.1 and earlier versions) when I turned off "overlay" in the cycles preview the gpu did not crash. Yesterday, unfortunately, the system crashed even in this case. It's good that I use Nvidia at work, but I equipped my private hardware with a radeon card, because I don't like when one company has a monopoly. I was glad to buy this card. I think AMD should consider withdrawing support for Linux. It has been more than 1.5 years since this bug wasn't fixed. I omit the fact that the installation of Open CL and ROCm drivers is not very intuitive compared to Nvidia solutions. What hurts me is the fact that, despite the manufacturer's assurances, the hardware does not fully support Linux systems. I think this is misleading customers. Maybe completely opening up the code and support to Mesa developers and giving them ROCm OpenCL and other technologies would be a good idea, while declaring that AMD is officially not 100% compatible with Linux. Right now I feel like the Radeon manufacturers are laughing in my face.
Not reporting a longstanding severe bug like this will hurt Blenders reputation. However in this case I assume Blender considers the reputational damage with the limited Linux AMD user-base more palatable then stating on there website AMD isn't very good.
I haven't seen that AMD understands the issue let alone has isolated the cause or planed an update to whatever component crashes the graphics. This issue might have years to run yet.
Yes indeed. When you face this bug first time and will not read forums, bug reports, it looks like Blender bug. When your system is stable and only one app crash. Conclusion is obvious. HIP option should be titled as experimental mode. You turning on on your own risk and for feedback
https://gitlab.freedesktop.org/drm/amd/-/issues/2145#note_1981631
New ROCm 5.6 still fix nothing (on my end)
https://gitlab.freedesktop.org/drm/amd/-/issues/2145#note_2114149
To anyone who can reproduce the reset: does
HSA_ENABLE_SDMA=0 ./blender
help at least a bit?In my case it helps. I can now render (F12, no viewport) in two Blender apps at once without the lockup or with the risk of it greatly reduced.
It could be an SDMA bug, similar to what happened in Mesa (forcing them to disable SDMA for OpenGL/radeonsi).
Tested with 2 viewpoints in 1 application rendering the default cube, instant crash when both render, was done with Rocm 5.6.1 and Blender 4.0 Beta.
We also missed the 1 year anniversary of this bug! At least someone with a "developer" tag is talking about this on drm/amd. however to quote them "will make an internal ticket and try to get someone investigating it" my guess no one will be interested.
Two viewports in one app won't work at all under any circumstances unfortunately. What I meant was whether it helps with two apps with one viewport each.
Isn't it a bad silly joke ?
hi! i wanted to say that on Fedora 38 system and more specifically Nobara 38 everything works right away. ROCM 5.7 is installed from the beginning and I can turn on several veiwports at once and even do a render at the same time. On Ubuntu 22.04 everything was shutting down immediately. my spec.
Thanks for sharing, I'll try to see if it works on my end.
I hope everything will work for you too! keeping my fingers crossed. I know how frustrating it is.
sounds like a linux issue
@MikolajNeronowicz
Are you saying you can do this from #102925 without crashing to TTL? If so you are the first report case of this working since 2.93...
I tried rocm 5.71 (via opencl-amd from aur as arch is still on 5.61 but it just freezes blender with little or no CPU use.
@survivorr-3
Correct, this issue is described correctly by its title.
Unfortunately, Nobara 38 with the latest (6.5.9) kernel + packages does not behave differently for me - just as buggy as on Arch, Fedora and Ubuntu. I had to disable 64bit decoding (resizable BAR) for HIP to even be able to be selected without crashing Blender (ROCm 5.7 is broken for me with rebar on).
Hi everyone. I just did a test and the program, and the system crashed for the first time on Nobara 38. it happened when i had viewport on and i only started another viewport. the second time everything went fine. I am using the blender version from the Nobara repository but on the binary file from the official website it also works. I'm uploading the video for proof and the terminal transcript from the rocminfo command.
Hello. We are looking at this internally. However at what time in your video does the crash happen? Also are you saying that one version works but the other doesn't? Or both crash?
The video I shared proves that it is possible to open two viewports at the same time and in it the crash does not happen, but I have to correct my assertion that in Nobara 38, HIP works seamlessly in Blender. Admittedly, it is more stable than in Ubuntu, but when I have 2 viewports with render preview enabled and try to move any object, it crashes immediately. The demo project from BMW, in the version of Bledera provided in Nobara Package Menager, which is 100% compatible with Fedora 38, causes a problem and won't render at all. The program immediately shuts down. In the version from the official site it worked fine and renders without any problem. Unfortunately, HIP support in Linux should be presented as Experimental. For the past 2 years, no one has been able to get Radeons to run stably in Blender. In games, my RX 6600 XT performs wonderfully. If this element of instability was fixed, I wouldn't think twice and would immediately upgrade my PC with an RX 6900 XT or RX 7800 XT. I'm very much rooting for AMD, but these bugs are severely frustrating.
Hi Mikolaj.
We need a bit more coherency here to help. Keep in mind we don't and can't test every distribution. Normally we test Ubuntu and Redhat / Centos. (We don't test "Nobara").
So how can we reproduce what you're saying in steps?
Also let's please be fair when you say "no one has been able to get Radeons to run stably in Blender." This is quite the exageration that no one can make it work. Clearly there is an issue with multiple viewports, it's hard to track down but that doesn't mean "no one".
Brian
Hi Mikolaj
Thanks for providing the video. A reason you might not crash is that your individual view-ports are not "sampling" at the same time. For instance:
Stability should be measured across commonly used features and supported platforms, Mikolaj's statement appears correct and not exaggerated. This log has been open for over a year and Blender announced OpenCL was EOL over two years ago. Put yourself in a the shoes of an AMD customer, would they buy AMD again?
Thank you for your reply.
I realize that there is no way to verify operation with every Linux distribution. I think you can limit yourself to Ubuntu and Red Hat (maybe Fedora), and only the official version of Blender, but so far on each distribution this problem with the whole system crashing when viewport occurred in a similar way. Often in Ubuntu, this bug occurred even with a single viewport, whenever I turned on the displacement options on the material, or whenever I didn't turn off the overlay. I've been following this thread for a long time and so far most radeon users confirm this. Might be tempted to implement ROCm in the Flatpak version? That would eliminate the distribution compatibility problem. The nvidia drivers work on the Flatpak version and it works with OPTIX and CUDA. I didn't mean to offend anyone. I am very much counting on AMD. If I can help somehow I will send all the necessary information. unfortunately I am not a tech-savvy user and only a graphics designer and open source enthusiast. So I need instructions. What can I do? How can I help?
By the way. I also noticed a drop in Open CL performance in Davinci Resolve. ROCm 5.7 is 200 or so points worse than 5.6. Previously the Blackmagic RAW Speed Test showed over 1500 pts and now just over 1300 pts. I tested on Davinci Studio 18 because 18.6 is half as good. But that's a whole different topic.
@MikolajNeronowicz you can try Debian. It has ROCm stack in the official repository (HIP version 5.2). That's the same ROCm version as in this ticket. I'd test this config, but I don't own Navi GPU.
Package you need is
libamdhip64-5
+ dependencies.Hi,
This issue is hitting me on the following system:
It hits almost instantly if I attempt to have a GPU render + rendered viewport, but it seems to also happen without the rendered viewport but not instantly. When the issue happens the system freezes for a few seconds, then the screen goes black and flickers in some small areas.
I was hoping that with its 24GiB of vram the 7900xtx would be great for rendering :-(
Is there anything I as a user with some software development background can do to help get this issue resolved? Debugging Blender or the HIP stack is obviously way beyond my skill set, but if there is anything I can provide that may help the devs helping with this I will do my best to provide them.
I am not sure if this is relevant (couldn't find a different bug talking about this), but on Windows 11 there is no freeze, however the materials come out wrong as all materials seem to use the same textures. I'm happy to use either OS for Blender, but right now neither seems to work for me.
I can imagine your frustration. Considering you spent so much money on equipment that by definition should be premium. I, for one, am waiting for the stability to be repaired and only then will I decide on something more expensive. Fortunately, as a post-professional graphic designer I also have a business Laptop with the latest gpu chip from Nvidia. But for personal projects I always choose alternatives. I count on AMD, because I don't want to support the monopoly of one brand. I built my entire personal computer on AMD. Radeon declares official support for Linux. but there should be information that this is in alpha stage. experimental. At least as far as ROCM is concerned. I think the path to improvement will be long because AMD focuses on the AI market and does not allocate funds for Linux support issues. Not lucrative enough. But on the other hand, it could have a positive impact on PR.
Seems it becomes even worse on Blender 4.0.
Fedora 39(6.5.10 + KDE) + RX7800XT
System freeze immediately when switching to HIP in blender preferences, even in default block&light scene. It's not completely crashed, since my voice meeting is still running(but cannot switch to other tty).
While on Fedora 38 + Blender 3.6.5 at least it won't crash in default scene.
Just to give an update. We've isolated at least that the bug started being apparent sometime after linux kernel 5.15.
Going back to the linux kernel of that vintage (5.15) everything should work ok.
So now we're just working to bisect what point of the linux kernel between now and then caused the change so we can narrow down a bit more.
But that's the best workaround I have now is to use a much older linux. Stay tuned.
I changed my GPU from NVIDIA to AMD and now I have this problem. Blender 4.0 and 3.6 crash with AMD HIP activated. My system is:
Name: gfx1031
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 6700 XT
Vendor Name: AMD
OS Fedora 39
kernel 6.6.3-200
Blender now opens without crash on launch but the moment i go edit > preferences > system menu, the application freezes. It even froze the whole system one time. When i try to kill the application, it still shows up on the system monitor of kde plasma and when i try to shut down my computer systemd-shutdown tries to kill the process.
System:
Hi @Esat-Namli
Blender with the Arch package hip-runtime-amd version 5.7.1 does crash blender creating a "zombie process" which prevents shutdown.
This issue seems to think it might be related to this issue which references this issue. The only solution I have found is to downgrade to the Arch packaged hip-runtime-amd version 5.6 as per this issue. Please note this does not fix #100353.
As far as I am aware no-one has posted proof (like a screen capture) of Cycles working with HIP reliably on Linux. The AMD rep suggested kernel 5.15 above (Ubuntu 20.04.6 i guess) however users in this thread and this thread have reported issues with 5.15 in the past. I think the most direct response is Blender has incorrectly stated that Cycles is supported on Linux.
There is a package called
hip-runtime-amd-blender
on the AUR that downgradeship-runtime-amd
and its hip/rocm-related dependencies to 5.6.1-1. It's not ideal, and could break any day if a dependency gets updated, but it does work.Tested on:
i have a issue that might be releated or might not
if i want to render something in blender with cycles on cpu it works 100% fine but if i try to click on gpu compute or click the HIP tab in the settings, blender just freezes and i cant close it no matter what i do
Kernel: 6.6.8-arch1-1
CPU: ryzen 5 3600
GPU: rx 7600
drivers: i have basicly every amd driver for linux from pacman and the aur
but maybe this is cuz i switched from an rx 580 -> rx 7600
the switch happend right around the time blender 4.0 released
my old rx 580 is incompatible with blender cycles so i dont know if this issue was before the switch
after the switch i reinstalled most of the drivers simply with pacman -S / yay -S and not fully remove and reinstall them
actually for a time, i got a message that said smt about core dumped but this message diappeared after i install the snap version of blender and did not reappear after i installed the pacman version
some advice?
if needed i can give driver versions
@zeroCaos you should report this as a separate issue. This ticket is for viewport freeze related to using multiple Blender instances.
Seems rocm guys found the reason of this problem.. Both blender freeze and system crash problem have the same reason. Take a look at the discussions here: https://patchwork.freedesktop.org/patch/573129/
Possibly we would get a proper fix soon :)
@Stat_Headcrabed From what I can see, that seems to be a fix only for the NULL pointer deference issue - the one that crashes Blender on ROCm 5.7.1 and newer: the change you provided a link to simply reverts that one commit from 2023-08-07, and the multi-ROCm-instance gfx hang has been happening for a much longer period of time.
Unless there's something I'm not reading or missing that people in the discussion pointed towards? I'd love for this to be finally resolved :D
@fililip From the discussions of that patch in that link I can see the commit reverted by the patch seems actually not the bug causing freeze by itself, but make the original crash bug(happens frequently when there's opengl+rocm workload together) to appear in another way. And they found out the reason of it.
Seems this patch fixes all problems on my system, but maybe it needs more test. To apply and compile you need to use newest linux-branch (next-20240125 on my side)
Thanks, I'll try it out and report back
What GPU/display protocol do you use? I've just tried to render with the viewport and F12 and the same issue occurred - both instances froze, and after I moved my mouse to open Firefox, the whole system hung.
I've cloned the latest linux-next (20240129), applied the patch you linked to and compiled & installed it.
My setup is:
Plasma 6.0 RC1 (Wayland)
RX 6600 XT
Here's my dmesg log when the GPU hangs
The alarming part is (I think):
ROCk/amdkfd still dereferences a nullptr
hmm.. So that is just an accident that it works on my computer... Sad:(
But still, that is a different issue than before (and probably progresses solving the real issue at hand); this particular deref happens after invoking the second instance (and not after selecting HIP in Blender, so that's new), and now the whole amdgpu module freezes, doesn't recover at all, and doesn't let me reboot or sometimes even get to ssh - I have to hard reset every time.
Is there a way to get in touch with the people from patchwork that propose these patches?
What about replying to these patches? You need to download the mailbox file from lore/patchwork, open them in your email client(In my case, thunderbird), then just click "Reply all".
I got a response from Felix:
@Stat_Headcrabed Sorry for the ping, but just to be clear: the v2 relocation patch fixes multi-instance ROCm workloads for you, or just the interop in one app, assuming you never use more than one HIP context (never render in more than one view at once)?
What happens if you launch two Blender apps and render using Cycles in both of them at the same time (or alternatively, hit F12 and use the viewport renderer at the same time in one app)? Do they freeze or continue working?
Hi,
I just updated to kernel 6.6.14-200.fc39.x86_64 and Blender 4.02 Cycles HIP rendering is working well. I think this bug is fixed.
Name: gfx1031
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 6700 XT
Vendor Name: AMD
OS Fedora 39
kernel 6.6.14-200
Did you have 2 view-ports "sampling" with cycles using GPU at the same time, as specified by this issue? There are so many problems with ROCM/HIP you maybe referring to #116697 or #112608 or #111847?
For me, 2 view-ports "sampling" with cycles using GPU at the same time still crashes:
I just tested with two separate instances at the same time and everything worked well. For both instances, the configurations are:
Render engine: Cycles
Feature set: Supported
Device: GPU Compute
also, I am using the static file from the Blender website.
This comment said our problem might be a firmware bug that needs to be fixed both in kernel code and firmware. And seems kernel code fix already landed in stable branch. Maybe that's the reason? Sadly I can't test that in next 2 weeks.
https://patchwork.freedesktop.org/patch/575997/#comment_1052187
Oh, great! It looks like we may finally have working compute on AMD! I'll take a look at home.
Really strange as kernel 6.6.14 (reported working by xhertan) is older than 6.7.3 (reported not working by me). Will have to wait for more people to test sampling in two windows at the same time still causes crashes (would also be good to get a screenshot to show it works).
The new firmware has not released yet, it's still under test: https://patchwork.freedesktop.org/patch/575997/#comment_1052570
Yesterday, I updated the kernel to 6.7.3-200.fc39.x86_64 and Blender+HIP is still working well. Multiple instances working fine, I tested with three instances at the same time, and all completed rendering without issues. I am using the Fedora rpmfusion ROCm package.
Can anyone who is able to reproduce the issue update their firmware to https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git (on Arch/Chaotic-AUR it's available under
linux-firmware-git
) and let us know if it fixes the hang?I've tested this myself with another person on two different graphics cards (GFX10, GFX11) and it seems to be resolved.
Note that this requires Linux 6.7 (or even 6.8) or later.
It seems with new rocm 6.1 it finally works. two days, never crash even two viewports.
rocm 6.1 no longer appears to suffer from an instant crash with 2 view-ports. Have not tested long term, but looks promising.
@brecht I think this issue should be closed now?
Yes, I think we've gotten enough reports of it working now. Thanks everyone for testing, and for your patience.