Render of 3D image is very slow #98190
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
10 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#98190
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
System Information
Operating system: Windows 8.1
Graphics card: Intel(R) HD Graphics 4400, 4.3.0 - Build 10.18.14.4264
Blender Version
Broken: 3.3.0 caused by
80859a6cb2
Worked: 3.1.0 onwards
Render of simple 3d image takes excessive time when sequencer is in pipeline but sequencer has no strips. While rendering slowly, the process size grows and Blender UI stops responding.
Exact steps for others to reproduce the error
Note:
When sequencer is in pipeline and VSE has at least one strip then Render Image completes in approx 1 second, displaying the expected strip.
When the sequencer is empty, but is in the pipeline then should the rendered image also be empty?
Supporting Video
https://youtu.be/A_dU0KY9miQ
Added subscriber: @blenderbug12345
#98710 was marked as duplicate of this issue
Render of 3D image is very slow when and EMPTY VSE is in the Post Processing pipeline.to Render of 3D image is very slow when an EMPTY VSE is in the Post Processing pipeline.Added subscriber: @lichtwerk
Changed status from 'Needs Triage' to: 'Needs User Info'
Cannot reproduce here
Added subscriber: @PratikPB2123
cannot reproduce either- 3.3,
8c4bd02b067a
I've added a video of the problem here: https://youtu.be/A_dU0KY9miQ
The problem is found in blender-3.2.0-beta+v32.a2dacefb4636-windows.amd64-release also.
Problem still present with File .. Defaults .. Load Factory Settings
I don't have a full set of dailys to play with at the moment, but it looks like some change made between apr 22/29th
apr.29 blender-3.2.0-alpha+master.a7db7f88b067-windows.amd64-release BUG
apr.22 blender-3.2.0-alpha+master.179100c02126-windows.amd64-release OK
Still cannot repro.
3.1.2 was fine, right?
3.1.2 is ok, although it renders the 3d scene even when the empty VSE is in the pipeline, but at least it's fast.
mar.31 blender-3.1.2-candidate+v31.2cfca7d9101d-windows.amd64-release OK
Looking at available dailys I see the slow behaviour first happen on apr.29
apr.29 blender-3.2.0-alpha+master.a1b21a350383-windows.amd64-release SLOW
apr.28 blender-3.2.0-alpha+master.308a12ac647d-windows.amd64-release OK
note that some builds earlier than apr.29 seem to have a flashing screen while rendering quickly
Added subscriber: @mano-wii
I cannot reproduce the problem.
Tested on:
April 29th was when the libraries were updated (See {rBL62897})
Maybe some library doesn't work well in hardware or driver (Windows 8 is starting to get outdated)
Please double-check if the drivers are up to date and the hardware meets Blender's requirements: https://www.blender.org/download/requirements
To upgrade to the latest driver, see here for more information: https://docs.blender.org/manual/en/dev/troubleshooting/gpu/index.html
Also can you check if any slowdowns are also observed when rendering with Eevee?
I took a closer look at how my system was set up.
Drivers are as new as I can get them to be for a windows 8.1 build.
I have NVIDIA Control Panel which lets you set up a profile for different applications. And blender.exe was running with the default Integrated graphics card for 3d operations. I think I set it up like this because I was getting overheating issues last summer, and then... I forgot about it. I've been using VSE exclusively for a few months and there's not much of a performance problem there because VSE doesn't seem to use the GPU, so again I didn't notice.
I got a copy of 'Process Explorer' so I could see what the GPU was actually doing on my windows 8.1 laptop and I think I understand it a little better.
On the latest alpha, with EEVEE as the render engine and when following the steps to reproduce the error:
On 3.1.0, with EEVEE:
So:
Since I can just turn on the GPU profile for blender.exe this becomes less of a problem for me.
I traced the slow-down in the alpha code as far as this call, which takes > 1 minute to return and soaks the CPU. There's nothing going on with the GPU until a small spike then the final image appears in the Render Result viewer.
Added subscriber: @Raimund58
What CPU model do you have? Or attach the system info. (Blender->Help->Save System Info)
system-info.txt
Changed status from 'Needs User Info' to: 'Needs Triage'
@blenderbug12345 Can you please update the driver from the integrated GPU and report back if it helped? If I researched correctly, your driver (4.3.0 - Build 10.18.14.4264) is from 2019
This comment was removed by @blenderbug12345
My mistake, you're referring to the Integrated Graphics driver, I'll check to see if that can be upgraded at all...
Upgraded Integrated Graphics driver to 10.18.14.5180 (highest possible available for this hardware)
Retested using the approach detailed above, and the problem is still present.
I notice that after blender is started up:
I notice that in the 3d viewer, if I switch 'Viewport Shading' from Solid to Rendered then the UI freezes for approx 1 minute before displaying the object using EEVEE shading.
I can then toggle between Solid and Rendered quickly and the UI is responsive.
So maybe this only happens the first time EEVEE is called upon in a session, but subsequent times it is fast.
Bump!
I traced the problem to slow calls like glUseProgram(...). This tells the Integrated Graphics 'GPU' to set up shaders etc. It takes 20 seconds to do this. Other bottlenecks also appear to be related to the gl* routines when a shader is used for the first time. After gl* has put stuff in the cache then subsequent operations are fast.
I'm not sure if OpenGL (or wherever these gl* library routines come from) was upgraded as part of the recent changes in blender or if there are other dependencies that glUseProgram(...) needs which appear slower now.
The proper fix, of course, is to give up and use the NVIDIA GPU all the time.
So I figure we can park or close this ticket due to hardware/drivers being too old to support the latest build.
Thanks for taking a look.
Changed status from 'Needs Triage' to: 'Needs User Info'
@blenderbug12345 since so far apparently only you have reported the issue, if you can bisect or at least find out which build introduced the issue (at https://builder.blender.org/download/daily/archive/) this might help determine what is triggering this issue in Blender.
Without this there is not much we can do here apart from suggesting checking on a newer drivers on the vendor website (and never rely on windows update to take care of drivers).
While this is a real issue, it is only happening to a few people so far. And to troubleshoot it developers needs to have direct access to hardware. And even then it is not clear whether it's a solvable problem from Blender side.
I've been stepping back in time over various commits looking for the one that blew up.
Starting from the latest sources, which build to 3.3.0-alpha I've been stepping back using git reset:
98ad294d17
A lot of these steps hit a broken build, which is understandable because not all checkins are good, and anyway the fix is usually in a newer check-in.
The last working build I get to is the one immediately after the muti-library update:
https://developer.blender.org/rB604c33e694375828b0f322010e8e4ac17b4b02f3
This build results in a SLOW render.
If I try my steps to build anything older than this then I get missing files reported, eg:
So I've not been able to build anything from sources in git that runs FAST.
Is there a better way for me to step back through commits and build a working exe?
Oops, I ended up forgetting the April 29th info.
Reviewing the commits, it's not obvious what causes the slowdown :
Could it be that updating the libraries was the cause?
Library changes need to be manually rolled back in SVN for the compilation to work.
I cannot identify what may have caused the
glUseProgram
to be slow. Could it be theOpenColorIO
update?We could check this by trying a build disabling the
WITH_OPENCOLORIO
Cmake option. And then check if that results in any change in render time.I hacked CMakeLists.txt on the latest 3.3.0 alpha:
and ran make clean, but it's still SLOW. I'm not very familiar with CMake so perhaps that was not the proper way to do it.
I did this:
The resulting build is SLOW ( > 1 minute )
The preceeding build is FAST, ie:
apr.14 https://developer.blender.org/rB48d2f536e182468c26a81ec2c2af2638e6768bb0 SLOW
...
apr.14 https://developer.blender.org/rB66dc4d4efb88ecf2d18bfa08ab9c43b024ebd2fb FAST
So it looks like the problems appeared in:
GPU: Make nodetree GLSL Codegen render engine agnostic
https://developer.blender.org/rB80859a6cb2726a39fb22cb49f06e0355dc9390a7
note: at some point the builds I was getting were complaining about webp.h not being found, so I commented out the references in blender\imbuf\intern\webp.c just to get past what I think is an unrelated issue.
Changed status from 'Needs User Info' to: 'Needs Triage'
@blenderbug12345, glad you were able to bisect and find the commit that introduced the problem :)
Since
glUseProgram
is where the code spends the most time, the slowness lies in compiling the shaders.80859a6cb2
changed the shaders code, but I don't see how it can make the compilation so much slower.I suspect the problem is related to a feature of some GPU's drivers (like Intel) which is to Cache Shaders.
If some shader code has already been compiled before, if the same code is to be recompiled, the cached shader will be used to avoid this re-compiling.
@blenderbug12345, would you be able to check if this is the case?
The shader cache is located in some Intel folder. You can test by deleting this folder and disabling (somehow) Intel's cache shaders feature.
(unfortunately I don't know how to be of more help. I don't have an Intel or Windows8 available)
I'm still looking at this, but I have an early observation.
I'm looking at the behaviour of the 3.3.0-alpha latest build.
glUseProgram takes a compiled/linked binary and puts it on the GPU, and this operation is very slow for ONE of the many shaders that blender asks it to handle.
After a shader is compiled/linked you can find out how big it is, and I looked at all the calls that blender makes when doing an F12 single frame render of the default scene.
I added this code to gl_shaders.cc:
There are about 280 unique shaders created for various tasks, and they're called hundreds of times for a simple render like this. Most of the shaders compile/link to a size of around 50000 bytes, some are smaller, some are a bit larger.
One shader compile/links to a size of 1153752 bytes, which is VERY much more than the rest. This is the shader that takes > 20 seconds to return in 3.3.0-alpha.
I compared this with the same numbers you get from 3.2.0-alpha (see the comments above for how I set that up).
The shaders on 3.2.0-alpha compile/link to roughly the same size as before, and there is also one shader that is much larger than the others. However this big shader is 'only' 537031 bytes big, that's 1/2 the size of what appears to be the 'same' shader on 3.3.0-alpha.
note: Shaders have a unique ID for the life time of a blender session and they seem to be created and used in the same order in both 3.2.0 and 3.3.0. Both of the very big shaders have ID 274 when running on my hardware. So I'm pretty sure they're performing the same general function. The code generated on 3.3.0 is just a lot larger.
The slow down in glUseShader might be because of the compiled/linked code size, or perhaps the 3.3.0 shader is just designed to perform a lot more work.
note: I did not find any useful cache controls for the Integrated Graphics processor on my hardware.
OpenGL commands are asynchronous and many are lazily initialized in the driver.
So, depending on the driver, it is quite possible that a shader is only compiled when
glUseProgram
is called.I'm not sure size really impacts performance that much.
But if the size of a supposed same shader changed from 537.031 to 1.153.752, this is interesting information that deserves investigation.
Firstly, I'm not an expert and c++ makes my head hurt, so I might be wrong about this, so...
I've been looking at the sources for the individual shaders that are appended to each program. I added this code around the glCompileShader call and logged the source code that each shader uses.
From gl_shader.cc::create_shader_stage...
Typically, a single program might have 2 shaders, each with their own code and each containing a 'void main()...' definition. In some shaders I'm seeing the same code fragments included more than once.
For example, this fragment from common_view_lib.glsl appears twice in the source code for one of the 2 BIG shaders discussed earlier:
A shader's source code is an array of strings which is gradually appended to as the shader is built. These source strings can be single lines, or multiple-line chunks of code, but they all end up together when the shader's source code is finally set.
There's code in gpu_shader_dependency.cc which ensures that the .glsl code included from a 'pragma BLENDER_REQUIRE(...)' only appears once in each GPUSource object. But I'm not sure if this takes account of other strings which are eventually combined to form the shader's source. So the same BLENDER_REQUIRE(...) might appear in two separate multiline-chunks which then go on to be combined in the finished shader.
I don't know how well the glCompileShader 'compiler' deals with this but I know that these large shaders take a long time to pass glUseProgram the first time they're called.
I also don't know if the shader really does need duplicate code in order to function.
Even if the source gets bigger, I don't think the difference is that big to justify the reported slowness. I still think Intel's shader cache system must be related.
Well I had a closer look and the duplicate code fragments are a known issue. I didn't notice that they're inside an ifndef. I'll keep looking into this but I'll wait a while longer next time before jumping to wrong conclusions.
Added subscriber: @iss
Changed status from 'Needs Triage' to: 'Needs User Info'
@blenderbug12345 do you still have this issue?
Just a question - I have checked out YT video provided and to me it seems, that this is only issue when you render for first time? Can you check that?
If sequencer is enabled in post processing, but there is no strip, it does not execute any VSE rendering code, so first and third render in that video had equivalent settings from what I can tell and first was slow, then third was fast, so that's why I ask.
Hi @iss,
yes the problem still happens.
I think the video is too complicated. The problem has nothing to do with the VSE. The FIRST render of a 3D image is slow when rendering the default cube from the 3D Viewport. When you render the cube a second time then it is fast.
So once the shader has been compiled and cached then the render is fast. But compiling the shader and using it the first time is slow.
Thanks for clarification, that's what I thought is the case. So your problem is, that compiled shader does not persist in cache, is that correct?
To me this seems like issue within driver and its infrastructure, unless Blender (or driver...) adds some randomness to shader on purpose really. But since you was able to identify commit that caused this behavior is strange too... Have you tried to modify the shader code itself to see whether this would resolve the delay? Perhaps make it grayscale by overwriting green and blue channels with red one.
Render of 3D image is very slow when an EMPTY VSE is in the Post Processing pipeline.to Render of 3D image is very slow whenRender of 3D image is very slow whento Render of 3D image is very slowHi @iss
I managed to make the rendering faster on my hardware/driver by modifying one of the shaders.
I've modified my 3.3.0-alpha to print out the source code for each shader it passes to useProgram, and I've saved the results to a separate file. So shader number 276 (on my machine) is the slow one and I've written the source to ./Shaders/276.txt.
I copied this 276.txt to my276.txt and hacked that code. I removed function definitions that were never called thinking that perhaps the shader binary was just too big to work well on my hardware. I removed quite a lot of unused code but still have around 5600 lines of .glsl in my276.txt.
When I run blender and render the default cube, I'm able detect when shader 276 is about to be built and then throw away the original shader source. I then copy the text from my276.txt instead. Blender is happy to do this and the result runs just as slow as the original, and the resulting image looks good.
I went back to my276.txt and tried knocking out various function calls to see if any of these contributed to the slowness. For example, I would use 'return' on the first line of a function, so that it still got called but had no effect on the rendered image.
After a lot of experimentation I got down to the code defined for the function node_bsdf_principled(...). This comes from gpu_shader_material_principled.glsl and is combined with other fragments to form shader 276. In that function there's this if/else/ statement:
After some tweaking I ended up wrapping the whole thing in ' if (false) ', and just called the un-optimized case directly:
And my276.txt renders the cube in < 10 seconds the first time it is called, compared to 1m20s in the original 276.txt
So I think the un-optimized case is best for my hardware/driver and something in one of the optimized cases is causing the slowdown.
Added subscribers: @Gioxyer, @OmarEmaraDev
Changed status from 'Needs User Info' to: 'Needs Triage'
Added subscribers: @fclem, @Jeroen-Bakker
Interesting, my thinking was, that modification of shader may change its hash, so if this is problem with driver cache it could somehow do something different... But I have found another report with same issue, so it's not limited to your system.
@fclem, @Jeroen-Bakker does above comment ring a bell?
I had another look at the big if/else statment. I know that when rendering the default cube, only this branch is called:
If I comment out the other ' result = closure_eval(... ' calls in the if/else then the first render is much faster.
I suspect that then you comment out a function call, then the compiler (either gsgl or driver bytecode) decides not to compile the function code at all. This could explain why the ' if ( false ) ' worked so quickly. None of the other closure_eval variants was being compiled.
So perhaps I just have a slow compiler.
I don't have much visibility on what the compiler/driver are doing with the source code, so I don't know if the closure_eval variants could be written in a different way to allow the compiler to process them faster.
Yeah this is the compiler not wanting to cooperate even if all the configuration variables are set as
const
.Not sure we can do much about that instead of trying to workaround this by other means which would make our shader code generation much more complex.
I'm currently just doing this as a hack to: gpu_shader_material_principled.glsl
It would be nice if there was an official way to profile shaders as they're built, and update a list of #defines that the codegen could add to the .glsl headers. That way blender could learn that it's running on rubbish hardware and maybe recompile the slow-compiling shaders so that the new #defines cause different code to be picked up.
I'm assuming that some shaders will be recompiled eventually over the course of a session, but I could be wrong about that.
Or maybe provide a way for regular users to add a #define that is then added to every shader before it's compiled.
Added subscriber: @Rodrigo_S.T
Added subscriber: @boydmouritsen
While I understand that workarounds would make the shader code generation more complex, the alternative (at least for me) is being stuck at version 3.1.x. This bug makes blender 3.2.x+ unviable for me and likely others.
I guess it comes down to whether Blender decides to leave these users behind and avoid complexity, or if the complexity is worth it to keep the users?
I dunno the answer but if Blender leaves this as is, perhaps consider announcing new minimum specs for 3.2.x+ releases to avoid any frustration.
This is likely due to the constants here being used in a series of comparisons (&&) to get the result of what should be it's own constant. A trivial quirk, but the compiler assumes the outcomes of these comparisons aren't necessarily constant, so it takes all branches.
This issue was referenced by
de818d81c3
Changed status from 'Needs Triage' to: 'Resolved'