Shader node float precision issue cause different results on CPU and GPU #67448

Open
opened 2019-07-22 15:05:30 +02:00 by Leo Wattenberg · 20 comments

System Information
Operating system: Windows-10-10.0.18362 64 Bits
Graphics card: GeForce GTX 1070 with Max-Q Design/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 430.86
CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz

Blender Version
Broken: version: 2.80rc2 (2.80 (sub 74), branch: master, commit date: 2019-07-18 14:52, hash: 38d4483c6a)

Short description of error
When rendering a mandelbulb volumetric thing on cycles using CPU and GPU, the two have very different ideas on how the scene should look like, making the individual tiles clearly visible.

Render
t3-rc2.png
(most tiles are GPU-rendered, note the stray red CPU-rendered tile to the left)

Blend
mandelbrot.blend

Potentially useful information

  • #50193 may be the same bug; I opened a new one because that bug doesn't report differences within a single frame, and is from 2016.
  • Since most of this object is endless math nodes, it may be some sort of floating-point thing that works differently on CPUs and GPUs.
  • I was following this tutorial: https://www.youtube.com/watch?v=WSQFt1Nruns
  • I noticed that some operations for which I expected single inputs (namely the sines) had two inputs. In the tutorial, the dude said the second input doesn't do anything, but then, that's from the alpha. Maybe taking a 0.5th sine of a value does something strange?
**System Information** Operating system: Windows-10-10.0.18362 64 Bits Graphics card: GeForce GTX 1070 with Max-Q Design/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 430.86 CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz **Blender Version** Broken: version: 2.80rc2 (2.80 (sub 74), branch: master, commit date: 2019-07-18 14:52, hash: `38d4483c6a`) **Short description of error** When rendering a mandelbulb volumetric thing on cycles using CPU and GPU, the two have very different ideas on how the scene should look like, making the individual tiles clearly visible. **Render** ![t3-rc2.png](https://archive.blender.org/developer/F7623022/t3-rc2.png) (most tiles are GPU-rendered, note the stray red CPU-rendered tile to the left) **Blend** [mandelbrot.blend](https://archive.blender.org/developer/F7623033/mandelbrot.blend) **Potentially useful information** * #50193 may be the same bug; I opened a new one because that bug doesn't report differences within a single frame, and is from 2016. * Since most of this object is endless math nodes, it may be some sort of floating-point thing that works differently on CPUs and GPUs. * I was following this tutorial: https://www.youtube.com/watch?v=WSQFt1Nruns * I noticed that some operations for which I expected single inputs (namely the sines) had two inputs. In the tutorial, the dude said the second input doesn't do anything, but then, that's from the alpha. Maybe taking a 0.5th sine of a value does something strange?
Author

Added subscriber: @leoxd.mtar

Added subscriber: @leoxd.mtar

#85988 was marked as duplicate of this issue

#85988 was marked as duplicate of this issue

Added subscriber: @MichaelHermann

Added subscriber: @MichaelHermann

How much of your GPU memory is used? I'm asking because I might have a similar issue.
For me, this occurs, when GPU memory is close to full. Then, some geometry seems to be omitted from the GPU, resulting in a tiled look. I'm using Microdisplacement btw.

How much of your GPU memory is used? I'm asking because I might have a similar issue. For me, this occurs, when GPU memory is close to full. Then, some geometry seems to be omitted from the GPU, resulting in a tiled look. I'm using Microdisplacement btw.
Author

GPU Memory is at 2.2GB, out of 8GB dedicated, 16GB dedicated+shared

GPU Memory is at 2.2GB, out of 8GB dedicated, 16GB dedicated+shared

Ok... thanks for the info. It seems unrelated.

Ok... thanks for the info. It seems unrelated.

Added subscriber: @gabe2252

Added subscriber: @gabe2252

I was going to report a similar issue I was having with volumetrics. I use a rx 580 8gb on linux (ubuntu 18.04) with the amdgpu driver (non pro) and blender 2.8 rc2

cpu+gpu.png

areoLux1.blend

I was going to report a similar issue I was having with volumetrics. I use a rx 580 8gb on linux (ubuntu 18.04) with the amdgpu driver (non pro) and blender 2.8 rc2 ![cpu+gpu.png](https://archive.blender.org/developer/F7623483/cpu_gpu.png) [areoLux1.blend](https://archive.blender.org/developer/F7623486/areoLux1.blend)
Member

Added subscriber: @LazyDodo

Added subscriber: @LazyDodo
Member

try changing this setting to distance

image.png

try changing this setting to distance ![image.png](https://archive.blender.org/developer/F7623525/image.png)
Member

nope, that did not work either. i'll leave it for someone smarter.

nope, that did not work either. i'll leave it for someone smarter.

Added subscriber: @GavinScott

Added subscriber: @GavinScott

Reproduces here (Win 10/Nvidia 1060). You can reduce the tile size and sampling and still see the issue. I note that the YouTube tutorial's final .blend file does not exhibit the problem when switched to Cycles (but then it has a somewhat different shader setup). It appears to be a very small progressive difference as it moves through each of the stages in the shader. If you remove most of them then the issue is invisible. As you add back in more of the top-level node groups, the problem kind of fades into existence and becomes more pronounced.

I looked for differences in settings, lighting etc. but nothing jumped out. I think it's probably just a very slight difference in the render kernels. It can be really hard to get identical floating-point behavior from two different code implementations.

When you use the Math node with a function like Sine that only takes a single value, the second input just stays unused and It will not matter what input it gets.

Reproduces here (Win 10/Nvidia 1060). You can reduce the tile size and sampling and still see the issue. I note that the YouTube tutorial's final .blend file does not exhibit the problem when switched to Cycles (but then it has a somewhat different shader setup). It appears to be a very small progressive difference as it moves through each of the stages in the shader. If you remove most of them then the issue is invisible. As you add back in more of the top-level node groups, the problem kind of fades into existence and becomes more pronounced. I looked for differences in settings, lighting etc. but nothing jumped out. I think it's probably just a very slight difference in the render kernels. It can be *really* hard to get identical floating-point behavior from two different code implementations. When you use the Math node with a function like Sine that only takes a single value, the second input just stays unused and It will not matter what input it gets.

Added subscribers: @brecht, @Sergey

Added subscribers: @brecht, @Sergey
Brecht Van Lommel was assigned by Sergey Sharybin 2019-07-25 15:38:43 +02:00

There is actually a code to ensure Distance sampling is used for all shaders when there are both CPU and GPU are involved, which seems to work the way it is designed to.

I can not reproduce the issue with the mandelbrot file. And it doesn't look like something what could be caused by different shading capabilities on GPU and CPU. But i can not reproduce the issue.

The aeroLux1 issue is that there are two volume domains, which are exactly matching together. It is possible that due to different precision on CPU and GPU one of the intersections is missing when ray exits the domains, causing integration loop to discard volume to avoid overblown pixel due to missive texture.
Just make one of the domains slightly bigger.
Not sure we can do anything about it.

@brecht, mind trying to reproduce the first issue?

There is actually a code to ensure Distance sampling is used for all shaders when there are both CPU and GPU are involved, which seems to work the way it is designed to. I can not reproduce the issue with the mandelbrot file. And it doesn't look like something what could be caused by different shading capabilities on GPU and CPU. But i can not reproduce the issue. The aeroLux1 issue is that there are two volume domains, which are exactly matching together. It is possible that due to different precision on CPU and GPU one of the intersections is missing when ray exits the domains, causing integration loop to discard volume to avoid overblown pixel due to missive texture. Just make one of the domains slightly bigger. Not sure we can do anything about it. @brecht, mind trying to reproduce the first issue?

Hm... now I'm not sure. I'll just post these pictures and let Devs decide if this could be related.
My scene has Volumes in there, but I disabled them to pinpoint the problem. DoF or Volumes don't seem to make a difference for this.

This first image shows the issue. The GPU tiles omit some geometry in front. (See left side of the jaw for example.)
bug4_noDOFnoVolume.png

In this second image I just pulled the camera back a little. Since my scene uses Microdisplacement, this reduced the memory consumption just below the critical point and the image looks fine. (I just stopped the render because it takes quite a long time)
bug5_noDOFnoVolume_9.3GBMem_CamMovedBack.png

I'm rendering with a 1080Ti with 11GB of memory. The second image needed about 9.3GB. Just above that, it starts looking like the first image.

If this is helpful to the issue or I can provide anything else, please let me know.
I can provide the .blend file, but relatively heavy and probably depending on the hardware if the issue shows up or not. Let me know if you believe it's helpful or if I can provide anything else.

Hm... now I'm not sure. I'll just post these pictures and let Devs decide if this could be related. My scene has Volumes in there, but I disabled them to pinpoint the problem. DoF or Volumes don't seem to make a difference for this. This first image shows the issue. The GPU tiles omit some geometry in front. (See left side of the jaw for example.) ![bug4_noDOFnoVolume.png](https://archive.blender.org/developer/F7633055/bug4_noDOFnoVolume.png) In this second image I just pulled the camera back a little. Since my scene uses Microdisplacement, this reduced the memory consumption just below the critical point and the image looks fine. (I just stopped the render because it takes quite a long time) ![bug5_noDOFnoVolume_9.3GBMem_CamMovedBack.png](https://archive.blender.org/developer/F7633054/bug5_noDOFnoVolume_9.3GBMem_CamMovedBack.png) I'm rendering with a 1080Ti with 11GB of memory. The second image needed about 9.3GB. Just above that, it starts looking like the first image. If this is helpful to the issue or I can provide anything else, please let me know. I can provide the .blend file, but relatively heavy and probably depending on the hardware if the issue shows up or not. Let me know if you believe it's helpful or if I can provide anything else.

Added subscriber: @ColeMorris

Added subscriber: @ColeMorris

I can reproduce the mandelbrot issue. The other issues reported here seem unrelated, and should be reported separately.

This is a small precision difference between math on the CPU and GPU in one of the shader nodes, which gets worse with every iteration until it has a big visible effect.

We need to figure out which node and operation exactly causes the issue. But we may not be able to fix this, the math function implementations on the GPU may be slightly different and we are not likely to implement our own, because of the performance impact.

I can reproduce the mandelbrot issue. The other issues reported here seem unrelated, and should be reported separately. This is a small precision difference between math on the CPU and GPU in one of the shader nodes, which gets worse with every iteration until it has a big visible effect. We need to figure out which node and operation exactly causes the issue. But we may not be able to fix this, the math function implementations on the GPU may be slightly different and we are not likely to implement our own, because of the performance impact.
Brecht Van Lommel was unassigned by Dalai Felinto 2019-12-23 16:33:37 +01:00
Brecht Van Lommel changed title from GPU-rendered tiles look different than CPU-rendered ones to Shader node float precision issue cause different results on CPU and GPU 2020-01-27 11:36:35 +01:00

Added subscriber: @Connor-Denning

Added subscriber: @Connor-Denning

Added subscribers: @olsarxd, @mano-wii, @lichtwerk

Added subscribers: @olsarxd, @mano-wii, @lichtwerk
Philipp Oeser removed the
Interest
Render & Cycles
label 2023-02-09 14:00:12 +01:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
11 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#67448
No description provided.