Cycles does not generate the exact same images when a scene is rendered twice #101726

Open
opened 2022-10-10 15:55:03 +02:00 by Sebastian Herholz · 21 comments

System Information
Operating system: Ubuntu 20.04
Graphics card: Nvidia 2070 Super

Blender Version
Broken: master
Worked:

Short description of error

The results generated by Cycles are not 100% deterministic.
As a consequence path guiding ( #92571 ) can not be implemented deterministically.

Exact steps for others to reproduce the error

  1. Start Blender and open a scene like monster

  2. Render the scene with 64spp and store the result as exr image (e.g., monster-run-0.exr).

  3. Repeat 1. and 2. and save the result as exr again (e.g., monster-run-1.exr).

  4. Use an image comparison tool such as tev (https://github.com/Tom94/tev) and compute the difference.

You will see that, even if both renderings were performed on the same machine, the resulting images have minor differences.
Note: It might be necessary to scale the diff-images to see the errors.

Run 0:
Screenshot from 2022-10-10 15-50-37.png
Run 1:
Screenshot from 2022-10-10 15-50-41.png
Diff:
Screenshot from 2022-10-10 15-51-15.png

Based on the default startup or an attached .blend file (as simple as possible).

**System Information** Operating system: Ubuntu 20.04 Graphics card: Nvidia 2070 Super **Blender Version** Broken: master Worked: **Short description of error** The results generated by Cycles are not 100% deterministic. As a consequence path guiding ( #92571 ) can not be implemented deterministically. **Exact steps for others to reproduce the error** 1. Start Blender and open a scene like `monster` 2. Render the scene with 64spp and store the result as exr image (e.g., monster-run-0.exr). 3. Repeat 1. and 2. and save the result as exr again (e.g., monster-run-1.exr). 4. Use an image comparison tool such as tev (https://github.com/Tom94/tev) and compute the difference. You will see that, even if both renderings were performed on the same machine, the resulting images have minor differences. Note: It might be necessary to scale the diff-images to see the errors. Run 0: ![Screenshot from 2022-10-10 15-50-37.png](https://archive.blender.org/developer/F13649368/Screenshot_from_2022-10-10_15-50-37.png) Run 1: ![Screenshot from 2022-10-10 15-50-41.png](https://archive.blender.org/developer/F13649371/Screenshot_from_2022-10-10_15-50-41.png) Diff: ![Screenshot from 2022-10-10 15-51-15.png](https://archive.blender.org/developer/F13649373/Screenshot_from_2022-10-10_15-51-15.png) Based on the default startup or an attached .blend file (as simple as possible).
Author
Member

Added subscriber: @sherholz

Added subscriber: @sherholz
Brecht Van Lommel was assigned by Sebastian Herholz 2022-10-10 15:55:47 +02:00
Member

Added subscriber: @OmarEmaraDev

Added subscriber: @OmarEmaraDev
Member

Changed status from 'Needs Triage' to: 'Needs Developer To Reproduce'

Changed status from 'Needs Triage' to: 'Needs Developer To Reproduce'
Member

I can also reproduce on the BMW scene on CPU. Not sure if the module considers this a bug though. So tagging the module for more information.

I can also reproduce on the BMW scene on CPU. Not sure if the module considers this a bug though. So tagging the module for more information.
Author
Member

The problem here is that when Cycles is not 100% deterministic, it will generate different training samples for path guiding at every run.
As a result, the guiding structure will always be different, as well as the sampling behavior (starting at the 2nd spp), and therefore
the results of two renderings of the same scene will have a completely different noise pattern.

In production, and according to @brecht, this is not acceptable.

The problem here is that when Cycles is not 100% deterministic, it will generate different training samples for path guiding at every run. As a result, the guiding structure will always be different, as well as the sampling behavior (starting at the 2nd spp), and therefore the results of two renderings of the same scene will have a completely different noise pattern. In production, and according to @brecht, this is not acceptable.
Author
Member

It did a little bit more debugging.

By adding some code to print out each path vertex (e.g., position, normal, random number, outgoing direction after BSDF sampling) P3250
I was able to compare two runs (1spp, single-threaded, at a small resolution, and with path guiding disabled) of a modified version of the monster scene.

https://1drv.ms/u/s!At4sZlTrZ-QKigYGyeU2_Jc5sbSF?e=B0h6tF

monster_small-0.log
monster_small-1.log

A diff of the output shows that 99.9% of the path segments are the same, and only a tiny fraction differs.
Screenshot from 2022-10-13 13-10-52.png

It seems that in most cases, the divergence starts with a tiny difference in the normal, which leads to a slightly different outgoing direction and so on.

It did a little bit more debugging. By adding some code to print out each path vertex (e.g., position, normal, random number, outgoing direction after BSDF sampling) [P3250](https://archive.blender.org/developer/P3250.txt) I was able to compare two runs (1spp, single-threaded, at a small resolution, and with path guiding disabled) of a modified version of the monster scene. https://1drv.ms/u/s!At4sZlTrZ-QKigYGyeU2_Jc5sbSF?e=B0h6tF [monster_small-0.log](https://archive.blender.org/developer/F13672829/monster_small-0.log) [monster_small-1.log](https://archive.blender.org/developer/F13672830/monster_small-1.log) A diff of the output shows that 99.9% of the path segments are the same, and only a tiny fraction differs. ![Screenshot from 2022-10-13 13-10-52.png](https://archive.blender.org/developer/F13672815/Screenshot_from_2022-10-13_13-10-52.png) It seems that in most cases, the divergence starts with a tiny difference in the normal, which leads to a slightly different outgoing direction and so on.
Author
Member

I tested now multiple versions of Blender (3.0.1 and 3.1.2), and it seems that
this happens in all versions but is away less prominent in 3.0.1:
3.0.1:
Screenshot from 2022-10-13 13-45-35.png
3.1.2:
Screenshot from 2022-10-13 13-45-43.png

I tested now multiple versions of Blender (3.0.1 and 3.1.2), and it seems that this happens in all versions but is away less prominent in 3.0.1: 3.0.1: ![Screenshot from 2022-10-13 13-45-35.png](https://archive.blender.org/developer/F13672927/Screenshot_from_2022-10-13_13-45-35.png) 3.1.2: ![Screenshot from 2022-10-13 13-45-43.png](https://archive.blender.org/developer/F13672929/Screenshot_from_2022-10-13_13-45-43.png)
Author
Member

I believe I identified, not all, but 3 problematic regions:

  • normal the intersection
  • random numbers
  • BSDF sampling

In all of these parts, it can happen that the output values are slightly different.
To test that, I did a dirty hack and quantized the outputs to 4 floating-point digits.
P3251

The behavior is not perfect but now similar to 3.0.1.:
Screenshot from 2022-10-13 15-27-37.png

@brecht I hope that helps.

I believe I identified, not all, but 3 problematic regions: - normal the intersection - random numbers - BSDF sampling In all of these parts, it can happen that the output values are slightly different. To test that, I did a dirty hack and quantized the outputs to 4 floating-point digits. [P3251](https://archive.blender.org/developer/P3251.txt) The behavior is not perfect but now similar to 3.0.1.: ![Screenshot from 2022-10-13 15-27-37.png](https://archive.blender.org/developer/F13673232/Screenshot_from_2022-10-13_15-27-37.png) @brecht I hope that helps.

I'm seeing an exact match in the monster when running ./blender -t 1. I suspect multi-threading in normal or tangent calculation, doing atomic float adds in undefined order. I suspect different in random numbers and BSDF sampling may be indirect consequences of different normals earlier in the path. Though there may be other unexplained factors.

I think these kinds of differences are fairly acceptable by themselves since it's quite localized, though not ideal. For OpenPGL, does this lead to completely different noise patterns over the entire image, or is it more localized?

I've wanted to store normals and tangents in some compressed/quantized way to save memory, which may indirectly help with this, but it would be an unreliable workaround at best. In general multi-threading in geometry nodes may not generate bit for bit matching results for positions or any attributes unless it was carefully implemented to avoid this. So I'm not sure if there is a practical and complete solution to this.

I'm seeing an exact match in the monster when running `./blender -t 1`. I suspect multi-threading in normal or tangent calculation, doing atomic float adds in undefined order. I suspect different in random numbers and BSDF sampling may be indirect consequences of different normals earlier in the path. Though there may be other unexplained factors. I think these kinds of differences are fairly acceptable by themselves since it's quite localized, though not ideal. For OpenPGL, does this lead to completely different noise patterns over the entire image, or is it more localized? I've wanted to store normals and tangents in some compressed/quantized way to save memory, which may indirectly help with this, but it would be an unreliable workaround at best. In general multi-threading in geometry nodes may not generate bit for bit matching results for positions or any attributes unless it was carefully implemented to avoid this. So I'm not sure if there is a practical and complete solution to this.
Contributor

Added subscriber: @Raimund58

Added subscriber: @Raimund58
Author
Member

I can verify that starting blender with blender -t 1 instead of just setting the rendering to single threaded via Performance->Thread->Thread Mode = fixed and Performance->Thread->Threads = 1
generates the same result.
This strengthens @brecht's theory that it is related to some multi-threaded pre-processing step (e.g., normal or tangent calculations).

@brecht the effect on path guiding can be big. While in the first rendering iteration, only a small set of samples will differ, they still lead to a different guiding structure.
In the second iteration, this slightly different structure leads to more variations of the samples for the second training iteration, and so on, and so on ...
The difference will get worse/larger with every training iteration.

Here is an example with 32spp (Note: this time, I didn't even need to scale the error):
Screenshot from 2022-10-14 11-55-51.png

At the moment, the determinism of path guiding is pretty unreliable.
Depending on the scene, it might work, or it doesn't.

One interesting fact is that this behavior was way less prominent in 3.0.1. Was there a change in the way the normal and tangents are pre-processed?

I can verify that starting blender with `blender -t 1` instead of just setting the rendering to single threaded via `Performance->Thread->Thread Mode = fixed` and `Performance->Thread->Threads = 1` generates the same result. This strengthens @brecht's theory that it is related to some multi-threaded pre-processing step (e.g., normal or tangent calculations). @brecht the effect on path guiding can be big. While in the first rendering iteration, only a small set of samples will differ, they still lead to a different guiding structure. In the second iteration, this slightly different structure leads to more variations of the samples for the second training iteration, and so on, and so on ... The difference will get worse/larger with every training iteration. Here is an example with 32spp (Note: this time, I didn't even need to scale the error): ![Screenshot from 2022-10-14 11-55-51.png](https://archive.blender.org/developer/F13677089/Screenshot_from_2022-10-14_11-55-51.png) At the moment, the determinism of path guiding is pretty unreliable. Depending on the scene, it might work, or it doesn't. One interesting fact is that this behavior was way less prominent in 3.0.1. Was there a change in the way the normal and tangents are pre-processed?
Author
Member

Added subscriber: @LukasStockner

Added subscriber: @LukasStockner
Author
Member

@brecht I had a chat with @LukasStockner at BCON he might have some ideas where this comes from.

@brecht I had a chat with @LukasStockner at BCON he might have some ideas where this comes from.
Member

Looks like the two sources of non-determinism are BKE_mesh_calc_normals_poly_and_vertex and Mikktspace::generateTSpaces. If you disable parallelism in both of those, the data buffers being copied to the device end up being identical between renders, and so do the rendered outputs.

And yes, @brecht got it right, both of those functions are doing atomic floating-point accumulations.

In theory it would probably work to do the accumulation either in fixed-point precision (which would honestly be fine for normals/tangents since they're bound to the -1..1 range anyways, and would even let us avoid the atomic CAS tricks that are needed for floats) or in double floating-point precision. Not sure how practical either of those are.

Looks like the two sources of non-determinism are `BKE_mesh_calc_normals_poly_and_vertex` and `Mikktspace::generateTSpaces`. If you disable parallelism in both of those, the data buffers being copied to the device end up being identical between renders, and so do the rendered outputs. And yes, @brecht got it right, both of those functions are doing atomic floating-point accumulations. In theory it would probably work to do the accumulation either in fixed-point precision (which would honestly be fine for normals/tangents since they're bound to the -1..1 range anyways, and would even let us avoid the atomic CAS tricks that are needed for floats) or in double floating-point precision. Not sure how practical either of those are.

Fixed precision would be good to try, though have not worked out if there would be problems with high vertex valence or angle weighting with small and large angles.

Fixed precision would be good to try, though have not worked out if there would be problems with high vertex valence or angle weighting with small and large angles.
Brecht Van Lommel removed their assignment 2023-02-08 03:35:34 +01:00
Philipp Oeser removed the
Interest
Render & Cycles
label 2023-02-09 14:04:05 +01:00

@brech Considering that this problem making check of functional changes really hard (which is a problem, if you working on some CPU optimistaions) - is it possible at least add preprocess defintion (until proper solution will be found) in order to allow to disable multithread execution for two mentioned functions above? I mean, blender -t 1 also works, but it is really slow, as expected.

@brech Considering that this problem making check of functional changes really hard (which is a problem, if you working on some CPU optimistaions) - is it possible at least add preprocess defintion (until proper solution will be found) in order to allow to disable multithread execution for two mentioned functions above? I mean, `blender -t 1` also works, but it is really slow, as expected.
Member

For mesh vertex normal calculation, I would like to look into caching and using a vertex -> face corner map. Then the accumulation of face normals for each vertex could happen without atomics, and in a deterministic order. Since such a map would be useful for many other operations, the cost of its creation could be amortized at least a bit. I think it may make the normal calculation faster too, but it requires some experimentation.

For mesh vertex normal calculation, I would like to look into caching and using a `vertex -> face corner` map. Then the accumulation of face normals for each vertex could happen without atomics, and in a deterministic order. Since such a map would be useful for many other operations, the cost of its creation could be amortized at least a bit. I think it may make the normal calculation faster too, but it requires some experimentation.

Another possibility here is that if the atomics still give best performance, we could keep using that approach for the viewport but not the final render. It's not a great solution but may be better than nothing.

Another possibility here is that if the atomics still give best performance, we could keep using that approach for the viewport but not the final render. It's not a great solution but may be better than nothing.
Member

I started experimenting with changing vertex normal calculation here: https://projects.blender.org/HooglyBoogly/blender/commits/branch/mesh-normals-calc-changes

I'm still not sure if creating the topology map will be too slow, but I think there's still plenty of room for improvement and more out-of-the-box thinking there. I hope the upcoming changes to replace MLoop and MPoly with a single integer each will benefit this approach too, since those arrays have to be accessed more with the changes applied.

I started experimenting with changing vertex normal calculation here: https://projects.blender.org/HooglyBoogly/blender/commits/branch/mesh-normals-calc-changes I'm still not sure if creating the topology map will be too slow, but I think there's still plenty of room for improvement and more out-of-the-box thinking there. I hope the upcoming changes to replace `MLoop` and `MPoly` with a single integer each will benefit this approach too, since those arrays have to be accessed more with the changes applied.

The idea I wanted to try at some point is to use atomic integer addition. For manifold meshes we know that the sum will not exceed -2pi..2pi, so that range could be mapped to 0..UINT_MAX. Small non-zero angles could be clamped to a minimum angle to avoid precision issues when e.g. the vertex has just one adjacent face.

The bigger problem would be non-manifold cases, for which you could extend the range a bit to say -8pi..8pi. And then for very rare cases where even that is not enough you'd need to fall back to something slower (serial execution, adjacency information or 64 bit integers). Although the vertex normals would already be quite meaningless for such meshes, so maybe that is not even needed.

The idea I wanted to try at some point is to use atomic integer addition. For manifold meshes we know that the sum will not exceed -2pi..2pi, so that range could be mapped to 0..UINT_MAX. Small non-zero angles could be clamped to a minimum angle to avoid precision issues when e.g. the vertex has just one adjacent face. The bigger problem would be non-manifold cases, for which you could extend the range a bit to say -8pi..8pi. And then for very rare cases where even that is not enough you'd need to fall back to something slower (serial execution, adjacency information or 64 bit integers). Although the vertex normals would already be quite meaningless for such meshes, so maybe that is not even needed.

5052e0d407 will help with this, but the guiding regression tests are still giving consistent results across platforms. Normal map tangents were also not solved yet, though none of the regression tests should be affected by that.

5052e0d4073d74cdc8d513034a208004fd44d593 will help with this, but the guiding regression tests are still giving consistent results across platforms. Normal map tangents were also not solved yet, though none of the regression tests should be affected by that.
Brecht Van Lommel added
Type
Bug
and removed
Type
Report
labels 2024-06-14 16:02:17 +02:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Code Documentation
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
7 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#101726
No description provided.