Cycles X broke OptiX memory pooling via NVLink #93620

Closed
opened 2021-12-03 23:56:58 +01:00 by Rincewind · 17 comments

System Information
Operating system: Windows-10-10.0.19042-SP0 64 Bits
Graphics card: NVIDIA GeForce RTX 2080 Ti/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 472.47

Blender Version
Broken: version: 3.0.0, branch: master, commit date: 2021-12-02 18:35, hash: f1cca30557
Worked: (newest version of Blender that worked as expected)

Short description of error
Distribute memory across devies (Optix ) is in Blender 3.0 broken

Exact steps for others to reproduce the error

  • Open the attached blend file in Blender 3.0.
  • Ensure that you have 2x RTX 2080TI with NVLink and enable "Distribute memory across devices" in OptiX
    Blender 3.0 config.png
  • Click on render
  • Render will crash with this error:
Failed to build OptiX acceleration structure

OPTIX_ERROR_INVALID_VALUE in optixAccelBuild(context, 0, &options, &build_input, 1, temp_mem.device_pointer, sizes.tempSizeInBytes, out_data.device_pointer, sizes.outputSizeInBytes, &out_handle, use_fast_trace_bvh ? &compacted_size_prop : 0, use_fast_trace_bvh ? 1 : 0) (C:\Users\blender\git\blender-v300\blender.git\intern\cycles\device\optix\device_impl.cpp:1066)OPTIX_ERROR_INVALID_VALUE in optixAccelBuild(context, 0, &options, &build_input, 1, temp_mem.device_pointer, sizes.tempSizeInBytes, out_data.device_pointer, sizes.outputSizeInBytes, &out_handle, use_fast_trace_bvh ? &compacted_size_prop : 0, use_fast_trace_bvh ? 1 : 0) (C:\Users\blender\git\blender-v300\blender.git\intern\cycles\device\optix\device_impl.cpp:1066)

Additional informations

  • "Distribute memory across devices" is working in CUDA mode (Blender 3.0)
  • "Distribute memory across devices" in OptiX is working in Blender 2.93 - you can open the attached file in 2.93 and will see it's rendering.
  • The attached file requires 17GB VRAM in OptiX in Blender 2.93. If you have 2x RTX 3090 this bug can be maybe also reproduced by increasing the required VRAM over 24GB, just duplicate some Suzsannes. I was only able to test it on 2x RTX 2080ti because my lack of 3090s ;)

Thank you for your help

memory test.blend

**System Information** Operating system: Windows-10-10.0.19042-SP0 64 Bits Graphics card: NVIDIA GeForce RTX 2080 Ti/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 472.47 **Blender Version** Broken: version: 3.0.0, branch: master, commit date: 2021-12-02 18:35, hash: `f1cca30557` Worked: (newest version of Blender that worked as expected) **Short description of error** Distribute memory across devies (Optix ) is in Blender 3.0 broken **Exact steps for others to reproduce the error** - Open the attached blend file in Blender 3.0. - Ensure that you have 2x RTX 2080TI with NVLink and enable "Distribute memory across devices" in OptiX ![Blender 3.0 config.png](https://archive.blender.org/developer/F12687806/Blender_3.0_config.png) - Click on render - Render will crash with this error: ``` Failed to build OptiX acceleration structure OPTIX_ERROR_INVALID_VALUE in optixAccelBuild(context, 0, &options, &build_input, 1, temp_mem.device_pointer, sizes.tempSizeInBytes, out_data.device_pointer, sizes.outputSizeInBytes, &out_handle, use_fast_trace_bvh ? &compacted_size_prop : 0, use_fast_trace_bvh ? 1 : 0) (C:\Users\blender\git\blender-v300\blender.git\intern\cycles\device\optix\device_impl.cpp:1066)OPTIX_ERROR_INVALID_VALUE in optixAccelBuild(context, 0, &options, &build_input, 1, temp_mem.device_pointer, sizes.tempSizeInBytes, out_data.device_pointer, sizes.outputSizeInBytes, &out_handle, use_fast_trace_bvh ? &compacted_size_prop : 0, use_fast_trace_bvh ? 1 : 0) (C:\Users\blender\git\blender-v300\blender.git\intern\cycles\device\optix\device_impl.cpp:1066) ``` **Additional informations** * "Distribute memory across devices" is working in CUDA mode (Blender 3.0) * "Distribute memory across devices" in OptiX is working in Blender 2.93 - you can open the attached file in 2.93 and will see it's rendering. * The attached file requires 17GB VRAM in OptiX in Blender 2.93. If you have 2x RTX 3090 this bug can be maybe also reproduced by increasing the required VRAM over 24GB, just duplicate some Suzsannes. I was only able to test it on 2x RTX 2080ti because my lack of 3090s ;) Thank you for your help [memory test.blend](https://archive.blender.org/developer/F12687804/memory_test.blend)
Author

Added subscriber: @Rincewind3D-1

Added subscriber: @Rincewind3D-1
Author

Added subscriber: @pmoursnv

Added subscriber: @pmoursnv
Author

@pmoursnv
Maybe you can take a look on this issue?

@pmoursnv Maybe you can take a look on this issue?

This issue was referenced by blender/cycles@2885c4c2c9

This issue was referenced by blender/cycles@2885c4c2c9809273c45488080bd0ff76efa23f7f

This issue was referenced by 3d5dbc1c44

This issue was referenced by 3d5dbc1c44907c73d2e6e57a146cbadaea9623bd

This issue was referenced by e14f8c2dd7

This issue was referenced by e14f8c2dd765a5f20d652899434174daa039804b
Author

Added subscriber: @patrick-24

Added subscriber: @patrick-24
Author

@pmoursnv

Hey,

thank you for the quick PR for this issue. But does your PR really fix this issue?
The problem is not getting out of memory, the problem is that the shared memory of two cards via NVLink is not working in OptiX. Memory size should be fine, if memory pooling would work like in 2.93.

@pmoursnv Hey, thank you for the quick PR for this issue. But does your PR really fix this issue? The problem is not getting out of memory, the problem is that the shared memory of two cards via NVLink is not working in OptiX. Memory size should be fine, if memory pooling would work like in 2.93.
Author

Removed subscriber: @patrick-24

Removed subscriber: @patrick-24
Member

The OPTIX_ERROR_INVALID_VALUE you are seeing is happening because of an out of memory (you can verify this is the log, with "--debug-cycles"). Likely this is the case because BVH builds are happening in parallel, which quickly exhausts available memory because of temporary build memory required (and has some additional known quirks when memory pooling is active), rather than serialized which does not suffer from that problem. That was fixed before (and is in 2.93), but the fix got lost in the Cycles X merge (and thus is not in 3.0), hence why 3.0 behaves differently. The rest of the pooled memory implementation has not changed.

The `OPTIX_ERROR_INVALID_VALUE` you are seeing is happening because of an out of memory (you can verify this is the log, with "--debug-cycles"). Likely this is the case because BVH builds are happening in parallel, which quickly exhausts available memory because of temporary build memory required (and has some additional known quirks when memory pooling is active), rather than serialized which does not suffer from that problem. That was fixed before (and is in 2.93), but the fix got lost in the Cycles X merge (and thus is not in 3.0), hence why 3.0 behaves differently. The rest of the pooled memory implementation has not changed.

Added subscriber: @deadpin

Added subscriber: @deadpin

Changed status from 'Needs Triage' to: 'Needs User Info'

Changed status from 'Needs Triage' to: 'Needs User Info'

@Rincewind3D-1 Are you able to try out a 3.1 build to double-check that the issue is fixed?

@Rincewind3D-1 Are you able to try out a 3.1 build to double-check that the issue is fixed?
Author

@deadpin

Yes, it working fine in 3.1.
Screenshot 2021-12-24 110425.png

Tested in:

  Blender 3.1.0 - Alpha
  December 24, 02:24:18 - 35bd6fe993a1
@deadpin Yes, it working fine in 3.1. ![Screenshot 2021-12-24 110425.png](https://archive.blender.org/developer/F12774134/Screenshot_2021-12-24_110425.png) Tested in: ``` Blender 3.1.0 - Alpha December 24, 02:24:18 - 35bd6fe993a1
Member

Added subscriber: @lichtwerk

Added subscriber: @lichtwerk
Member

Changed status from 'Needs User Info' to: 'Resolved'

Changed status from 'Needs User Info' to: 'Resolved'
Philipp Oeser self-assigned this 2021-12-28 13:07:36 +01:00
Member

So since e14f8c2dd7 is in 3.1 master, and it is also on the list #93479 (3.0 Potential candidates for corrective releases), will close.

So since e14f8c2dd7 is in 3.1 master, and it is also on the list #93479 (3.0 Potential candidates for corrective releases), will close.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Code Documentation
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
FBX
Interest
Freestyle
Interest
Geometry Nodes
Interest
glTF
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Asset System
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
5 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#93620
No description provided.