Regression: Slow renders on Intel Mac / AMD GPU #120006

Open
opened 2024-03-28 09:42:29 +01:00 by Werner · 25 comments

System Information
Operating system: MacOs 14.4.1
Graphics card: AMD Radeon Pro Vega II 32 GB (Metal 3)

Blender Version
Broken: 4.2.0 Alpha (SHA 05c1abe59c, 19 Apr 01:49)
Worked: 4.0.2, 4.1.1

Short description of error

Any cycles GPU render in 4.2 takes a factor 0f 2-3 more time compared to versions 3.6.x-4.0.2 and 4.1.1 (which are all consistent).


Note for people reading the outdated comments. The original report was for a performance regression in Blender 4.1.0. This was fixed in 4.1.1, but 4.2 still has a regression.

**System Information** Operating system: MacOs 14.4.1 Graphics card: AMD Radeon Pro Vega II 32 GB (Metal 3) **Blender Version** Broken: 4.2.0 Alpha (SHA `05c1abe59c`, 19 Apr 01:49) Worked: 4.0.2, 4.1.1 **Short description of error** Any cycles GPU render in 4.2 takes a factor 0f 2-3 more time compared to versions 3.6.x-4.0.2 and 4.1.1 (which are all consistent). --- Note for people reading the outdated comments. The original report was for a performance regression in Blender 4.1.0. This was fixed in 4.1.1, but 4.2 still has a regression.
Werner added the
Severity
Normal
Type
Report
Status
Needs Triage
labels 2024-03-28 09:42:30 +01:00
Member
Outdated comment I can not reproduce the slowdown on a M1 Pro with Metal GPU rendering when comparing Blender 4.0.2 to 4.1.

This is likely another performance regression specific to AMD GPUs with the Cycles Metal backend. @iss Do you have the right hardware to test this (A AMD GPU on macOS)? Or know the right person to contact to test this?

<details> <summary>Outdated comment</summary> I can not reproduce the slowdown on a M1 Pro with Metal GPU rendering when comparing Blender 4.0.2 to 4.1. This is likely another performance regression specific to AMD GPUs with the Cycles Metal backend. @iss Do you have the right hardware to test this (A AMD GPU on macOS)? Or know the right person to contact to test this? </details>
Outdated comment Tested with junk shop: 4.0: 28.82s 4.1: 43.62s

So can confirm regression MacOS 14.0, 5700XT

<details> <summary>Outdated comment</summary> Tested with junk shop: 4.0: 28.82s 4.1: 43.62s So can confirm regression MacOS 14.0, 5700XT </details>
Richard Antalik changed title from Bad render times using Blender 4.1 on Intel Mac Pro 2019 / AMD Radeon Vega II to Regression: Slow renders on Intel Mac / AMD GPU 2024-03-28 18:52:05 +01:00
Member
cc @Michael-Jones
Member

CC @brecht as well because you've done previous work on working around these performance regressions. And just to keep you informed on the Metal AMD GPU issues.

CC @brecht as well because you've done previous work on working around these performance regressions. And just to keep you informed on the Metal AMD GPU issues.
Author
Outdated comment

I investigated the problem somewhat further by compiling Blender 4.1:

Variant 1:
Compiled with CommandLineTools 15.3

Variant 2:
Compiled with CommandLineTools 14.3.1
(In order to do so, I had to remove the "-ld_classic" link flag in "build_files/cmake/platform/platform_apple.cmake")

Tests with junk-shop
4.0 from Blender download page: 38.81 s
4.1 from Blender download page: 62.50 s
4.1 Variant 1: 61.88 s
4.1 Variant 2: 37.44 s

Conclusion:
The performance drop is related to the 15.3 SDK in MacOS.
When compiling with 14.3.1, 4.1 shows the same performance as 4.0!

Hope this helps somewhat...

<details> <summary>Outdated comment</summary> I investigated the problem somewhat further by compiling Blender 4.1: - Downloaded the source-code from github - git clone https://projects.blender.org/blender/blender.git - git checkout tags/v4.1.0 - make update - mkdir build - cd build - ccmake .. - in cmake: configure (several times) + generate - make -j22 - make install Variant 1: Compiled with CommandLineTools 15.3 Variant 2: Compiled with CommandLineTools 14.3.1 (In order to do so, I had to remove the "-ld_classic" link flag in "build_files/cmake/platform/platform_apple.cmake") Tests with junk-shop 4.0 from Blender download page: 38.81 s 4.1 from Blender download page: 62.50 s 4.1 Variant 1: 61.88 s 4.1 Variant 2: 37.44 s Conclusion: The performance drop is related to the 15.3 SDK in MacOS. When compiling with 14.3.1, 4.1 shows the same performance as 4.0! Hope this helps somewhat... </details>
Author
Outdated comment

Forgot to mention:
The first time you start up and render, the kernels have to be compiled/loaded.

I noticed with the official release of 4.1 and 4.1 - Variant 1, see above), this takes only about 10 seconds, while for 4.0 or 4.1 - Variant 2 it takes much longer (about 180 seconds).

Could it be that the kernels are not actually compiled correctly?
However I did not see any warnings or errors.
Does it somehow falls back to OpenCl? The GPU load is nearly the same for all variants...

Best regards

<details> <summary>Outdated comment</summary> Forgot to mention: The first time you start up and render, the kernels have to be compiled/loaded. I noticed with the official release of 4.1 and 4.1 - Variant 1, see above), this takes only about 10 seconds, while for 4.0 or 4.1 - Variant 2 it takes much longer (about 180 seconds). Could it be that the kernels are not actually compiled correctly? However I did not see any warnings or errors. Does it somehow falls back to OpenCl? The GPU load is nearly the same for all variants... Best regards </details>
Member
Outdated comment @camot does this version of Blender resolve the performance issue? #120126 (comment)

If you would prefer to build it yourself, then please switch back to CommandLineTools 15.3 and apply this code change to Blender 4.1: !120299


I bring this up because in Blender 4.1, MetalRT is accidentally enabled on all macOS 14+ AMD GPU macs. But I don't think the Vega II or 5700XT properly support MetalRT, and might be experiencing a performance regression because of it.

!120299 fixes this issue by only enable MetalRT on the devices it's supposed to be enabled on and it might resolve the performance issue you are experiencing in the process.


You do mention that dropped to CommandLineTools 14.3.1 fixes the performance issue, and I believe the code that activates MetalRT is skipped with older compilers, hence why I think this might help. But I might be wrong, I'm not that familiar in this area .

<details> <summary>Outdated comment</summary> @camot does this version of Blender resolve the performance issue? https://projects.blender.org/blender/blender/issues/120126#issuecomment-1161734 If you would prefer to build it yourself, then please switch back to CommandLineTools 15.3 and apply this code change to Blender 4.1: !120299 --- I bring this up because in Blender 4.1, MetalRT is accidentally enabled on all macOS 14+ AMD GPU macs. But I don't think the Vega II or 5700XT properly support MetalRT, and might be experiencing a performance regression because of it. !120299 fixes this issue by only enable MetalRT on the devices it's supposed to be enabled on and it might resolve the performance issue you are experiencing in the process. --- You do mention that dropped to CommandLineTools 14.3.1 fixes the performance issue, and I believe the code that activates MetalRT is skipped with older compilers, hence why I think this might help. But I might be wrong, I'm not that familiar in this area . </details>
Author
Outdated comment @Alaska I did as you suggested: info.use_metalrt_by_default = false; + compiled with 15.3

I can confirm that this indded resolves the render performance issue.
junk shop takes 37.74 s, the same as with 4.0

It would be nice if one could disable metalRT through the settings.

Thank you very much to resolve the issue!

Best regards!

<details> <summary>Outdated comment</summary> @Alaska I did as you suggested: info.use_metalrt_by_default = false; + compiled with 15.3 I can confirm that this indded resolves the render performance issue. junk shop takes 37.74 s, the same as with 4.0 It would be nice if one could disable metalRT through the settings. Thank you very much to resolve the issue! Best regards! </details>
Member
Outdated comment

It would be nice if one could disable metalRT through the settings.

You can disable MetalRT in Blender settings if your GPU is officially support by MetalRT. However I believe your GPU isn't officially support, and so the setting doesn't show up.

The fact MetalRT was being enabled on your GPU when it wasn't officially supported was a mistake in the code, and seems to be the cause for the performance issues you are having.

<details> <summary>Outdated comment</summary> > It would be nice if one could disable metalRT through the settings. You can disable MetalRT in Blender settings if your GPU is officially support by MetalRT. However I believe your GPU isn't officially support, and so the setting doesn't show up. The fact MetalRT was being enabled on your GPU when it wasn't officially supported was a mistake in the code, and seems to be the cause for the performance issues you are having. </details>

I've landed a fix from Alaska to the main branch, and the fix should be included into 4.1.1.
Unfortunately, I can not fully confirm the fix on the current main branch, as the render result is just black, no matter whether HW RT is enabled or not. There is still some issue to be fixed there. It might be something unrelated to the root cause of this issue in 4.1.0, but I am not fully confident of considering this report fully resolved before we have confirmation that fix will also work for 4.2.

I've landed a fix from Alaska to the main branch, and the fix should be included into 4.1.1. Unfortunately, I can not fully confirm the fix on the current main branch, as the render result is just black, no matter whether HW RT is enabled or not. There is still some issue to be fixed there. It might be something unrelated to the root cause of this issue in 4.1.0, but I am not fully confident of considering this report fully resolved before we have confirmation that fix will also work for 4.2.
Member

Blender 4.1.1 is now out. Can @camot and/or @iss please test Blender 4.1.1 to confirm if the issue has been fixed? https://www.blender.org/download/

Can you also test the latest version of Blender 4.2 and let us know if that has a performance regression? (The rendering issue in 4.2 Sergey mentionied above has also been fixed) https://builder.blender.org/download/daily/


I am reducing the priority of this bug down to normal as prior testing points to this issue being fixed. I just want confirmation before closing the report.

Blender 4.1.1 is now out. Can @camot and/or @iss please test Blender 4.1.1 to confirm if the issue has been fixed? https://www.blender.org/download/ Can you also test the latest version of Blender 4.2 and let us know if that has a performance regression? (The rendering issue in 4.2 Sergey mentionied above has also been fixed) https://builder.blender.org/download/daily/ --- I am reducing the priority of this bug down to normal as prior testing points to this issue being fixed. I just want confirmation before closing the report.
Alaska added
Severity
Normal
and removed
Severity
High
labels 2024-04-19 14:30:48 +02:00
Author

@Alaska:
Can confirm 4.1.1 fixes the issue!
junk shop takes 37.47 s, the same as with 4.0!
So all good for 4.1.1

However in 4.2.0 Alpha (SHA 05c1abe59c, 19 Apr 01:49, Intel):
junk shop takes 84 s (average of 3 runs, all within 0.3 s)!
So it has definitively a performance regression!

Best regards

@Alaska: Can confirm 4.1.1 fixes the issue! junk shop takes 37.47 s, the same as with 4.0! So all good for 4.1.1 However in 4.2.0 Alpha (SHA 05c1abe59c95, 19 Apr 01:49, Intel): junk shop takes 84 s (average of 3 runs, all within 0.3 s)! So it has definitively a performance regression! Best regards
Member

Increasing the priority again, and adjusting the report description to reflect the new information.

Increasing the priority again, and adjusting the report description to reflect the new information.
Alaska added
Severity
High
and removed
Severity
Normal
labels 2024-04-20 06:10:53 +02:00
Alaska added the
Platform
macOS
label 2024-04-20 08:12:48 +02:00

I am now looking into this issue.

The latest 4.2 branch was rendering completely black. It was fixed with #123140. I've poked the buildbot to possibly deliver an updated build sooner: https://builder.blender.org/admin/#/builders/248/builds/130

For the performance I have some different numbers from what was mentioned earlier in this report. I've attached the full benchmark. In my tests it is only the wdas_cloud scene which is slower (is the last column, i think the label got clipped). I'll see what we can do to solve this.

The point is: i can not reproduce slowdown reported in the junkshop file.
@camot Will you be able to test an updated 4.2 build (with the fix i've mentioned previously) to see if it is something that got resolved, or is it one of those issues which we can not reproduce on our hardware?

I am now looking into this issue. The latest 4.2 branch was rendering completely black. It was fixed with #123140. I've poked the buildbot to possibly deliver an updated build sooner: https://builder.blender.org/admin/#/builders/248/builds/130 For the performance I have some different numbers from what was mentioned earlier in this report. I've attached the full benchmark. In my tests it is only the `wdas_cloud` scene which is slower (is the last column, i think the label got clipped). I'll see what we can do to solve this. The point is: i can not reproduce slowdown reported in the junkshop file. @camot Will you be able to test an updated 4.2 build (with the fix i've mentioned previously) to see if it is something that got resolved, or is it one of those issues which we can not reproduce on our hardware?
Author

@Sergey

Thanks for your work!
I have tested JunkShop in Blender 4.2.0 Beta, SHA 0ab1291716, 13 Jun 02:05, Intel:

Average of 3 runs: 41.11 s, (all within 0.3 s)

Compared to 4.1.1(37.47 s) this is a performance loss of about 10%.

Best regards!

@Sergey Thanks for your work! I have tested JunkShop in Blender 4.2.0 Beta, SHA 0ab129171618, 13 Jun 02:05, Intel: Average of 3 runs: 41.11 s, (all within 0.3 s) Compared to 4.1.1(37.47 s) this is a performance loss of about 10%. Best regards!

@camot Do you do F12 render or command-line render?

@camot Do you do F12 render or command-line render?

@camot Also, maybe you can disable denoiser and compositor, to help nailing down where exactly the slowdown is coming from for you.

@camot Also, maybe you can disable denoiser and compositor, to help nailing down where exactly the slowdown is coming from for you.
Author

@Sergey : I did F12 render. Do you expect any difference compared to command line render?

@Sergey : I did F12 render. Do you expect any difference compared to command line render?
Author

@Sergey
Did the test disabling denoising and compositor

Results: 4.4.1 4.2 Beta
denoising + composing 37.47 s 41.11 s
wo denoising 34.73 s 38.14 s
wo composing 34.71 s 38.00 s
wo denoising + composing 33.58 s 34.74 s

Somehow the result obtained by 4.1.1 are strange.:
By switching off denoising only, the gain is about about 2.7 s
By switching off composing only, the gain is about about 2.7 s
However, switching off BOTH, the gain is only 4 s
Looks strange to me.

For 4.2 Beta the individual gain of composing and denoising is about 3 s and for both is about 5.4 s which is nearly the sum of the individual gains as one should expect?

Best regards

@Sergey Did the test disabling denoising and compositor Results: 4.4.1 4.2 Beta denoising + composing 37.47 s 41.11 s wo denoising 34.73 s 38.14 s wo composing 34.71 s 38.00 s wo denoising + composing 33.58 s 34.74 s Somehow the result obtained by 4.1.1 are strange.: By switching off denoising only, the gain is about about 2.7 s By switching off composing only, the gain is about about 2.7 s However, switching off BOTH, the gain is only 4 s Looks strange to me. For 4.2 Beta the individual gain of composing and denoising is about 3 s and for both is about 5.4 s which is nearly the sum of the individual gains as one should expect? Best regards

In theory there might be some difference. I was just trying to understand where the difference is, because I can not reproduce the slowdown with Junkshop on the MacPro we have here at the studio.

I've spent quite some time trying to find any way to get some performance, and hopefully solve the slower volume render. So far I could not have find anything that would help, even after disabling a lot of code. It is almost like some refactor triggered compiler to do wrong decision somewhere in terms of inlining or spills.

To be honest, I am not really sure what would be the practical time investment chasing the slowdown.
Is almost like we squeezed everything we could from a platform which has no official support (no drivers update, including no fixes for the compilers either).

In theory there might be some difference. I was just trying to understand where the difference is, because I can not reproduce the slowdown with Junkshop on the MacPro we have here at the studio. I've spent quite some time trying to find any way to get some performance, and hopefully solve the slower volume render. So far I could not have find anything that would help, even after disabling a lot of code. It is almost like some refactor triggered compiler to do wrong decision somewhere in terms of inlining or spills. To be honest, I am not really sure what would be the practical time investment chasing the slowdown. Is almost like we squeezed everything we could from a platform which has no official support (no drivers update, including no fixes for the compilers either).
Author

@Sergey
May I ask what GPU model you have installed on your MacPro?

@Sergey May I ask what GPU model you have installed on your MacPro?

@camot It's AMD Radeon PRO W6800X.
It's in the graph i've shown above, perhaps should have been a bit more explicit about it.
And its using MacOS 14.5.

@camot It's AMD Radeon PRO W6800X. It's in the graph i've shown above, perhaps should have been a bit more explicit about it. And its using MacOS 14.5.
Author

@Sergey
The W6800X is more powerful compared to my Radeon Vega II Pro. For the Vega II I observed 3D graphics performance loss in standard applications like Maps, Google earth etc in macOS 13.x amd 14.x compared to previous versions of the OS. This degradation was reported to be less severe for the W6800.

@Sergey The W6800X is more powerful compared to my Radeon Vega II Pro. For the Vega II I observed 3D graphics performance loss in standard applications like Maps, Google earth etc in macOS 13.x amd 14.x compared to previous versions of the OS. This degradation was reported to be less severe for the W6800.
Brecht Van Lommel added
Type
Bug
and removed
Type
Report
labels 2024-06-14 15:59:21 +02:00

Surely the W6800 is more powerful, but typically when the slowdown is caused by factors which are in our control, the slowdown is proportional.

In this case I wasn't able to find a way to make things faster neither for the junkshop scene, nor for the volume scene. It is not like there is a big new feature which was to the kernel which will allow us to make that optional, avoiding slowdown in the existing scenes. Rather, it seems, the series of fixes and other improvements lead to compiler being upset.

The current state of Metal rendering on non-Apple-Silicon is already not at a feature parity, it does not pass regression tests, and we are not planning to support it for 4.3. Keeping this in mind, I am not sure how much time it is practical to be spending on finding a compiler work-around for this issue.

Surely the W6800 is more powerful, but typically when the slowdown is caused by factors which are in our control, the slowdown is proportional. In this case I wasn't able to find a way to make things faster neither for the junkshop scene, nor for the volume scene. It is not like there is a big new feature which was to the kernel which will allow us to make that optional, avoiding slowdown in the existing scenes. Rather, it seems, the series of fixes and other improvements lead to compiler being upset. The current state of Metal rendering on non-Apple-Silicon is already not at a feature parity, it does not pass regression tests, and we are not planning to support it for 4.3. Keeping this in mind, I am not sure how much time it is practical to be spending on finding a compiler work-around for this issue.

After discussion with @sergey, we decided not to invest more time in this. We'll keep this open as a known issue for 4.2 LTS, and if someone wants to contribute a fix we can accept it.

For Blender 4.3 we already plan to drop macOS AMD GPU support for Cycles, due to issues like this. On other platforms the AMD GPU compiler is improved and actively maintained. But on macOS it seems to still have many of the same issue we had when we were still using OpenCL, where small changes have unpredictable effects, and we have to do long trial and error to guess how to work around bugs and performance issues.

After discussion with @sergey, we decided not to invest more time in this. We'll keep this open as a known issue for 4.2 LTS, and if someone wants to contribute a fix we can accept it. For Blender 4.3 we already plan to drop macOS AMD GPU support for Cycles, due to issues like this. On other platforms the AMD GPU compiler is improved and actively maintained. But on macOS it seems to still have many of the same issue we had when we were still using OpenCL, where small changes have unpredictable effects, and we have to do long trial and error to guess how to work around bugs and performance issues.
Brecht Van Lommel added
Severity
Normal
Type
Known Issue
and removed
Severity
High
Type
Bug
labels 2024-06-14 17:17:26 +02:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
6 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#120006
No description provided.