Vector displacement suspected to cause geometry artefact and CUDA error #111277

Closed
opened 2023-08-18 19:41:42 +02:00 by Niklas-Becker · 8 comments

System Information
Operating system: Windows-10-10.0.19045-SP0 64 Bits
Graphics card: NVIDIA GeForce RTX 3070/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 536.23
Edit: Updated GPU driver to 536.99, error still exist.

Blender Version
Broken: version: 3.6.2, branch: blender-v3.6-release, commit date: 2023-08-16 16:43, hash: e53e55951e7a
Worked: 3.5

Short description of error
Blender Versions 3.6.2 and 3.6.1 create black geometry artifacts when rendering with CUDA (Viewport and renderer).
When the file is loaded with the 3.5 version, everything is normal.
3.6.3 crates this error too.

Exact steps for others to reproduce the error
I´ve attached a sample file which creates the error.
Just switch to rendered Viewport and you will see the black artifacts on the table (or see attached screenshot).

After testing we found that:

  • The black visual artefact is only there if vector displacement node is used.
  • And the CUDA error only happens when this artefact is seen.

Original description:

I´ve got CUDA to crash when i entered edit mode and selecting the vertices which are part of an artifact.
(Not happening every time)
I´ve attached a screenshot of such a CUDA crash.

The artifacts vanish instantly when you rotate the table the slightest to any direction.
I´ve seen the same artifacts on other objects in other scenes. They seem to only appear on surfaces which are absolutely planar to the YZ plane but this might be a false observation.
CUDA error: "Illegal address in CUDA queue copy_from_device (integrator_shade_surface integrator_sorted_paths_array prefix_sum)" as seen in the crash screenshot.

**System Information** Operating system: Windows-10-10.0.19045-SP0 64 Bits Graphics card: NVIDIA GeForce RTX 3070/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 536.23 Edit: Updated GPU driver to 536.99, error still exist. **Blender Version** Broken: version: 3.6.2, branch: blender-v3.6-release, commit date: 2023-08-16 16:43, hash: `e53e55951e7a` Worked: 3.5 **Short description of error** Blender Versions 3.6.2 and 3.6.1 create black geometry artifacts when rendering with CUDA (Viewport and renderer). When the file is loaded with the 3.5 version, everything is normal. 3.6.3 crates this error too. **Exact steps for others to reproduce the error** I´ve attached a sample file which creates the error. Just switch to rendered Viewport and you will see the black artifacts on the table (or see attached screenshot). After testing we found that: - The black visual artefact is only there if vector displacement node is used. - And the CUDA error only happens when this artefact is seen. --------- Original description: I´ve got CUDA to crash when i entered edit mode and selecting the vertices which are part of an artifact. (Not happening every time) I´ve attached a screenshot of such a CUDA crash. The artifacts vanish instantly when you rotate the table the slightest to any direction. I´ve seen the same artifacts on other objects in other scenes. They seem to only appear on surfaces which are absolutely planar to the YZ plane but this might be a false observation. CUDA error: "Illegal address in CUDA queue copy_from_device (integrator_shade_surface integrator_sorted_paths_array prefix_sum)" as seen in the crash screenshot.
Niklas-Becker added the
Type
Report
Severity
Normal
Status
Needs Triage
labels 2023-08-18 19:41:43 +02:00

Same problem seen in #92189 and #97633

It looks like it's a GPU driver problem. Users have reported that it has been resolved, but it is still not clear what causes it or how to resolve it.

Please double-check if the drivers are up to date. To upgrade to the latest driver, see here for more information: https://docs.blender.org/manual/en/dev/troubleshooting/gpu/index.html

Same problem seen in #92189 and #97633 It looks like it's a GPU driver problem. Users have reported that it has been resolved, but it is still not clear what causes it or how to resolve it. Please double-check if the drivers are up to date. To upgrade to the latest driver, see here for more information: https://docs.blender.org/manual/en/dev/troubleshooting/gpu/index.html
Germano Cavalcante added
Status
Needs Information from User
and removed
Status
Needs Triage
labels 2023-08-18 20:36:30 +02:00
Author

I did some more testing:

  1. Updating driver - did not help

  2. Rendering with CPU (AMD RYZEN 5 3600):

    • with Open Shading Language: Problem Gone
    • without Open Shading Language: Problem still there
  3. My brother tested it on his PC (Same Blender version, same file)
    GTX 780, Intel CPU:

    • Have seen the same problem with CUDA rendering
    • No problem at all with CPU rendering (Open Shading Language on or off)
  4. Shading:

    • smooth Shading: Problem gone
    • flat Shading: Problem still there

It seems that it is not a single GPU/GPU-Driver problem since it also happened on CPU rendering.

It is only happening on the YZ plane (around X axis).

I did some more testing: 1. Updating driver - did not help 2. Rendering with CPU (AMD RYZEN 5 3600): - with Open Shading Language: Problem Gone - without Open Shading Language: Problem still there 3. My brother tested it on his PC (Same Blender version, same file) GTX 780, Intel CPU: - Have seen the same problem with CUDA rendering - No problem at all with CPU rendering (Open Shading Language on or off) 4. Shading: - smooth Shading: Problem gone - flat Shading: Problem still there It seems that it is not a single GPU/GPU-Driver problem since it also happened on CPU rendering. It is only happening on the YZ plane (around X axis).
Germano Cavalcante added
Status
Needs Triage
and removed
Status
Needs Information from User
labels 2023-08-18 22:25:44 +02:00
Member

This looks like a problem with normal attributes and OSL. The black triangle problem is gone after merge by distance, you probably have duplicated faces etc. Try merge by distance and see if it runs correctly on GPU.

After a bit testing (CPU render)

  • The black triangle will still be there even if you move related vertices from and back to its original position.
  • The material doesn't seem to use any OSL node, but turning on OSL did get rid of this black triangle.

Not sure what is going on with the geometry, could be that material has some div by zero situation going on, but that shouldn't cause CUDA to error like that.

Can't test CUDA for now due to my exceptionally broken nvidia driver on linux...

This looks like a problem with normal attributes and OSL. ~~The black triangle problem is gone after `merge by distance`, you probably have duplicated faces etc. Try `merge by distance` and see if it runs correctly on GPU.~~ After a bit testing (CPU render) - The black triangle will still be there even if you move related vertices from and back to its original position. - The material doesn't seem to use any OSL node, but turning on OSL did get rid of this black triangle. Not sure what is going on with the geometry, could be that material has some div by zero situation going on, but that shouldn't cause CUDA to error like that. Can't test CUDA for now due to my exceptionally broken nvidia driver on linux...
YimingWu added
Module
Render & Cycles
Status
Needs Information from User
and removed
Status
Needs Triage
labels 2023-08-19 06:15:27 +02:00
YimingWu added
Status
Confirmed
and removed
Status
Needs Information from User
labels 2023-08-19 06:24:13 +02:00
Author

I think i have it narrowed down to the "Vector Displacement" node.
If in this node the "Tangent Space" is selected, the error occurs.
If "World Space" or "Object Space" is selected, the error is gone.

I have changed the sample file so you can reproduce the error.
In the shader settings of the wood material (only white diffuse in there now) you can see this behavior on the Vector Displacement Node.

Somehow there is a UV-map for the object (i cant remember doing this). The faces with the artifacts have a width of zero in the UV-Map editor. When i project a new UV-Map the artifacts are gone.
The weird thing is that this UV-Map is not used in the material. To be sure that it is not used i plugged the "generated UV-map" into the normal input of the shader.

I think i have it narrowed down to the "Vector Displacement" node. If in this node the "Tangent Space" is selected, the error occurs. If "World Space" or "Object Space" is selected, the error is gone. I have changed the sample file so you can reproduce the error. In the shader settings of the wood material (only white diffuse in there now) you can see this behavior on the Vector Displacement Node. Somehow there is a UV-map for the object (i cant remember doing this). The faces with the artifacts have a width of zero in the UV-Map editor. When i project a new UV-Map the artifacts are gone. The weird thing is that this UV-Map is not used in the material. To be sure that it is not used i plugged the "generated UV-map" into the normal input of the shader.
Member

Thanks for the investigation... Then this is something that needs fixing in vector displacement.

Still not sure if this caused the cuda error tho, is the cuda error still there?

Thanks for the investigation... Then this is something that needs fixing in vector displacement. Still not sure if this caused the cuda error tho, is the cuda error still there?
Author

Just tested it again, the CUDA error is still there and only if the artifacts are seen.

Just tested it again, the CUDA error is still there and only if the artifacts are seen.
Member

Then I guess it's the problem in the vector displacement code.

Then I guess it's the problem in the vector displacement code.
YimingWu changed title from CUDA queue Error / Black geometry artifacts to Vector displacement suspected to cause geometry artefact and CUDA error 2023-08-19 08:52:27 +02:00
Member

I decided to take a quick look into debugging this issue. This is what I could find so far.

  1. This issue is not necessarily something wrong with the Vector Displacement node.
  2. The issue only occurs (in my testing) when using the Light Tree with the Vector Displacement node. So turning off the Light Tree resolves this issue.

Now onto why.

From a quick look at debugging, when the Light Tree is deciding which light to sample, during one of the calculations, it will pick an emitter_index that is outside the range of viable index. And in doing so, Cycles encounters an assert, and CUDA reports back with an "illegal address" error.

Why is the light tree picking an invalid emitter_index? Due to issues with the Vector Displacement node and the input it is being fed. The Vector Displacement node is causing some NaN normals to appear on the surface (the black spots). And since the light tree uses the normal of the surface for some of it's calculations, various values can end up as NaN and weird code paths can be taken.

So there are two potential areas where this bug can be fixed.

  1. The Light Tree code can be modified to handle these cases better to avoid errors. #111292
  2. The Vector Displacement code can be modified to avoid NaNs fixing both the error and the black rendering. (OSL already seems to do this) #111294
I decided to take a quick look into debugging this issue. This is what I could find so far. 1. This issue is not necessarily something wrong with the `Vector Displacement` node. 2. The issue only occurs (in my testing) when using the `Light Tree` with the `Vector Displacement` node. So turning off the `Light Tree` resolves this issue. Now onto why. From a quick look at debugging, when the Light Tree is deciding which light to sample, during one of the calculations, it will pick an `emitter_index` that is outside the range of viable index. And in doing so, Cycles encounters an assert, and CUDA reports back with an "illegal address" error. Why is the light tree picking an invalid `emitter_index`? Due to issues with the `Vector Displacement` node and the input it is being fed. The Vector Displacement node is causing some `NaN` normals to appear on the surface (the black spots). And since the light tree uses the normal of the surface for some of it's calculations, various values can end up as `NaN` and weird code paths can be taken. So there are two potential areas where this bug can be fixed. 1. The Light Tree code can be modified to handle these cases better to avoid errors. #111292 2. The Vector Displacement code can be modified to avoid NaNs fixing both the error and the black rendering. (OSL already seems to do this) #111294
Blender Bot added
Status
Resolved
and removed
Status
Confirmed
labels 2023-08-21 15:22:17 +02:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#111277
No description provided.