Cycles does not generate the exact same images when a scene is rendered twice #101726
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Code Documentation
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
7 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#101726
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
System Information
Operating system: Ubuntu 20.04
Graphics card: Nvidia 2070 Super
Blender Version
Broken: master
Worked:
Short description of error
The results generated by Cycles are not 100% deterministic.
As a consequence path guiding ( #92571 ) can not be implemented deterministically.
Exact steps for others to reproduce the error
Start Blender and open a scene like
monster
Render the scene with 64spp and store the result as exr image (e.g., monster-run-0.exr).
Repeat 1. and 2. and save the result as exr again (e.g., monster-run-1.exr).
Use an image comparison tool such as tev (https://github.com/Tom94/tev) and compute the difference.
You will see that, even if both renderings were performed on the same machine, the resulting images have minor differences.
Note: It might be necessary to scale the diff-images to see the errors.
Run 0:
Run 1:
Diff:
Based on the default startup or an attached .blend file (as simple as possible).
Added subscriber: @sherholz
Added subscriber: @OmarEmaraDev
Changed status from 'Needs Triage' to: 'Needs Developer To Reproduce'
I can also reproduce on the BMW scene on CPU. Not sure if the module considers this a bug though. So tagging the module for more information.
The problem here is that when Cycles is not 100% deterministic, it will generate different training samples for path guiding at every run.
As a result, the guiding structure will always be different, as well as the sampling behavior (starting at the 2nd spp), and therefore
the results of two renderings of the same scene will have a completely different noise pattern.
In production, and according to @brecht, this is not acceptable.
It did a little bit more debugging.
By adding some code to print out each path vertex (e.g., position, normal, random number, outgoing direction after BSDF sampling) P3250
I was able to compare two runs (1spp, single-threaded, at a small resolution, and with path guiding disabled) of a modified version of the monster scene.
https://1drv.ms/u/s!At4sZlTrZ-QKigYGyeU2_Jc5sbSF?e=B0h6tF
monster_small-0.log
monster_small-1.log
A diff of the output shows that 99.9% of the path segments are the same, and only a tiny fraction differs.
It seems that in most cases, the divergence starts with a tiny difference in the normal, which leads to a slightly different outgoing direction and so on.
I tested now multiple versions of Blender (3.0.1 and 3.1.2), and it seems that
this happens in all versions but is away less prominent in 3.0.1:
3.0.1:
3.1.2:
I believe I identified, not all, but 3 problematic regions:
In all of these parts, it can happen that the output values are slightly different.
To test that, I did a dirty hack and quantized the outputs to 4 floating-point digits.
P3251
The behavior is not perfect but now similar to 3.0.1.:
@brecht I hope that helps.
I'm seeing an exact match in the monster when running
./blender -t 1
. I suspect multi-threading in normal or tangent calculation, doing atomic float adds in undefined order. I suspect different in random numbers and BSDF sampling may be indirect consequences of different normals earlier in the path. Though there may be other unexplained factors.I think these kinds of differences are fairly acceptable by themselves since it's quite localized, though not ideal. For OpenPGL, does this lead to completely different noise patterns over the entire image, or is it more localized?
I've wanted to store normals and tangents in some compressed/quantized way to save memory, which may indirectly help with this, but it would be an unreliable workaround at best. In general multi-threading in geometry nodes may not generate bit for bit matching results for positions or any attributes unless it was carefully implemented to avoid this. So I'm not sure if there is a practical and complete solution to this.
Added subscriber: @Raimund58
I can verify that starting blender with
blender -t 1
instead of just setting the rendering to single threaded viaPerformance->Thread->Thread Mode = fixed
andPerformance->Thread->Threads = 1
generates the same result.
This strengthens @brecht's theory that it is related to some multi-threaded pre-processing step (e.g., normal or tangent calculations).
@brecht the effect on path guiding can be big. While in the first rendering iteration, only a small set of samples will differ, they still lead to a different guiding structure.
In the second iteration, this slightly different structure leads to more variations of the samples for the second training iteration, and so on, and so on ...
The difference will get worse/larger with every training iteration.
Here is an example with 32spp (Note: this time, I didn't even need to scale the error):
At the moment, the determinism of path guiding is pretty unreliable.
Depending on the scene, it might work, or it doesn't.
One interesting fact is that this behavior was way less prominent in 3.0.1. Was there a change in the way the normal and tangents are pre-processed?
Added subscriber: @LukasStockner
@brecht I had a chat with @LukasStockner at BCON he might have some ideas where this comes from.
Looks like the two sources of non-determinism are
BKE_mesh_calc_normals_poly_and_vertex
andMikktspace::generateTSpaces
. If you disable parallelism in both of those, the data buffers being copied to the device end up being identical between renders, and so do the rendered outputs.And yes, @brecht got it right, both of those functions are doing atomic floating-point accumulations.
In theory it would probably work to do the accumulation either in fixed-point precision (which would honestly be fine for normals/tangents since they're bound to the -1..1 range anyways, and would even let us avoid the atomic CAS tricks that are needed for floats) or in double floating-point precision. Not sure how practical either of those are.
Fixed precision would be good to try, though have not worked out if there would be problems with high vertex valence or angle weighting with small and large angles.
@brech Considering that this problem making check of functional changes really hard (which is a problem, if you working on some CPU optimistaions) - is it possible at least add preprocess defintion (until proper solution will be found) in order to allow to disable multithread execution for two mentioned functions above? I mean,
blender -t 1
also works, but it is really slow, as expected.For mesh vertex normal calculation, I would like to look into caching and using a
vertex -> face corner
map. Then the accumulation of face normals for each vertex could happen without atomics, and in a deterministic order. Since such a map would be useful for many other operations, the cost of its creation could be amortized at least a bit. I think it may make the normal calculation faster too, but it requires some experimentation.Another possibility here is that if the atomics still give best performance, we could keep using that approach for the viewport but not the final render. It's not a great solution but may be better than nothing.
I started experimenting with changing vertex normal calculation here: https://projects.blender.org/HooglyBoogly/blender/commits/branch/mesh-normals-calc-changes
I'm still not sure if creating the topology map will be too slow, but I think there's still plenty of room for improvement and more out-of-the-box thinking there. I hope the upcoming changes to replace
MLoop
andMPoly
with a single integer each will benefit this approach too, since those arrays have to be accessed more with the changes applied.The idea I wanted to try at some point is to use atomic integer addition. For manifold meshes we know that the sum will not exceed -2pi..2pi, so that range could be mapped to 0..UINT_MAX. Small non-zero angles could be clamped to a minimum angle to avoid precision issues when e.g. the vertex has just one adjacent face.
The bigger problem would be non-manifold cases, for which you could extend the range a bit to say -8pi..8pi. And then for very rare cases where even that is not enough you'd need to fall back to something slower (serial execution, adjacency information or 64 bit integers). Although the vertex normals would already be quite meaningless for such meshes, so maybe that is not even needed.
5052e0d407
will help with this, but the guiding regression tests are still giving consistent results across platforms. Normal map tangents were also not solved yet, though none of the regression tests should be affected by that.