OIDN on GPU conflicts with Persistent data in some cases #123145

Open
opened 2024-06-12 19:35:37 +02:00 by Efim Petelin · 16 comments

System Information
Operating system: Windows-10-10.0.19045-SP0 64 Bits
Graphics card: NVIDIA GeForce RTX 3080/PCIe/SSE2 NVIDIA Corporation 4.6.0 NVIDIA 555.85

Blender Version
Broken: version: 4.1.1, branch: blender-v4.1-release, commit date: 2024-04-15 15:11, hash: e1743a0317bc
Worked: before OIDN gone on GPU

Short description of error
I discovered a strange behavior in certain cases: when your scene is quite heavy and eats all VRAM AND you enable Persistent data option in Render Settings / Performance tab AND OIDN on GPU is enabled, in half cases render crashes on random of first few frames. The only solution is to fallback to use OIDN on CPU.
I cannot upload such troublesome file because of NDA and file's size (about 7 GiBs).
A have attached the crash log and console output for this scene.

**System Information** Operating system: Windows-10-10.0.19045-SP0 64 Bits Graphics card: NVIDIA GeForce RTX 3080/PCIe/SSE2 NVIDIA Corporation 4.6.0 NVIDIA 555.85 **Blender Version** Broken: version: 4.1.1, branch: blender-v4.1-release, commit date: 2024-04-15 15:11, hash: `e1743a0317bc` Worked: before OIDN gone on GPU **Short description of error** I discovered a strange behavior in certain cases: when your scene is quite heavy and eats all VRAM AND you enable Persistent data option in Render Settings / Performance tab AND OIDN on GPU is enabled, in half cases render crashes on random of first few frames. The only solution is to fallback to use OIDN on CPU. I cannot upload such troublesome file because of NDA and file's size (about 7 GiBs). A have attached the crash log and console output for this scene.
Efim Petelin added the
Severity
Normal
Type
Report
Status
Needs Triage
labels 2024-06-12 19:35:38 +02:00
Member

Hi, thanks for the report. I suspect the system ran out of memory when persistent data is enabled (AFAICS ~8 gb memory is used already ). How much RAM/VRAM do you have? Can you monitor memory usage during the render?

Hi, thanks for the report. I suspect the system ran out of memory when persistent data is enabled (AFAICS ~8 gb memory is used already ). How much RAM/VRAM do you have? Can you monitor memory usage during the render?
Pratik Borhade added
Status
Needs Information from User
and removed
Status
Needs Triage
labels 2024-06-13 06:33:10 +02:00
Author

Good time of day!
My guess is that OIDN data unpredictably replaces some crucial data of render itself (part of persistent data), because of lack of vram. But interesting part is that it happens absolutely random.
In my case I have several full-CG shots of full feture movie, they all represent the same scene shot by different angles and optics. In some cases shot renders with OIDN on gpu flawlessly, and in other cases, sometimes even simpler in quantity of data, crashes randomly.
Right now one of the shots being rendered (I set OIDN to CPU for reliability reasons) and there is screenshot of RAM and VRAM usage. For one frame render cycle.

Good time of day! My guess is that OIDN data unpredictably replaces some crucial data of render itself (part of persistent data), because of lack of vram. But interesting part is that it happens absolutely random. In my case I have several full-CG shots of full feture movie, they all represent the same scene shot by different angles and optics. In some cases shot renders with OIDN on gpu flawlessly, and in other cases, sometimes even simpler in quantity of data, crashes randomly. Right now one of the shots being rendered (I set OIDN to CPU for reliability reasons) and there is screenshot of RAM and VRAM usage. For one frame render cycle.
Member

Thanks. I see VRAM is almost tanked up. So crash is likely due to the "out of memory" case.
High memory usage due to persistent data is expected. But I'm in doubt about few things:

  • Does persistent data uses VRAM to store render data or RAM as well?
  • When ran out of VRAM, does it use shared memory?

@Alaska hi, any idea here? Otherwise will have to poke cycles devs.

Thanks. I see VRAM is almost tanked up. So crash is likely due to the "out of memory" case. High memory usage due to persistent data is expected. But I'm in doubt about few things: - Does persistent data uses VRAM to store render data or RAM as well? - When ran out of VRAM, does it use shared memory? @Alaska hi, any idea here? Otherwise will have to poke cycles devs.
Author

If it could help there is some additional info I forgot to tell: my renders set up to use OptiX, not CUDA. RAM reservation for CUDA is set by default in driver settings. I'm not sure what does it means: is it on or off? Neithertheless it shouldn't have to do anything with optix render, isn't it?
The scene itself composed of 50-80 millions of triangles (most of it is instanced) and uses few dosens of 8k textures (some are 8-bit, most are 24-bit and three or four are floating point ones).

If it could help there is some additional info I forgot to tell: my renders set up to use OptiX, not CUDA. RAM reservation for CUDA is set by default in driver settings. I'm not sure what does it means: is it on or off? Neithertheless it shouldn't have to do anything with optix render, isn't it? The scene itself composed of 50-80 millions of triangles (most of it is instanced) and uses few dosens of 8k textures (some are 8-bit, most are 24-bit and three or four are floating point ones).
Member

The error seems to be occuring during a memcopy on the CPU during the setup of the camera motionblur for the next frame of rendering.

Based on the screenshots provided, there's plenty of RAM left for the CPU. So I'm not sure why this is occuring, or why OIDN on the GPU impacts things. Unless I'm mis-understanding how memory is distributed accross devices (which is likely).

@efimpetelin Have you overclocked/udervolted your CPU or your RAM (Including settings like XMP and XPO in your BIOS)? If so, can you try disabling it and see if it helps? I'm thinking there's a possibility of some sort of hardware issue impacting things here, but I could be wrong.

Can you also share what CPU you have? 13th and 14th generation Intel CPUs have been having some stability issues in some programs leading to random crashes and errors.


It might be best to try and reproduce the issue first, then try contact a Cycles developers for input.
I ran some quick tests and couldn't reproduce this issue, but I could be doing something wrong.

The error seems to be occuring during a `memcopy` on the CPU during the setup of the camera motionblur for the next frame of rendering. Based on the screenshots provided, there's plenty of RAM left for the CPU. So I'm not sure why this is occuring, or why OIDN on the GPU impacts things. Unless I'm mis-understanding how memory is distributed accross devices (which is likely). @efimpetelin Have you overclocked/udervolted your CPU or your RAM (Including settings like XMP and XPO in your BIOS)? If so, can you try disabling it and see if it helps? I'm thinking there's a possibility of some sort of hardware issue impacting things here, but I could be wrong. Can you also share what CPU you have? 13th and 14th generation Intel CPUs have been having some stability issues in some programs leading to random crashes and errors. --- It might be best to try and reproduce the issue first, then try contact a Cycles developers for input. I ran some quick tests and couldn't reproduce this issue, but I could be doing something wrong.
Author

@Alaska Hello! No, I haven't overclocked/undervolted any of my gear parts. And never had any cpu-ram problems on it. But I'm using XMP profile (never thought it could be treated as overclocking though). There is bunch of specs screenshots in attachments.

@Alaska Hello! No, I haven't overclocked/undervolted any of my gear parts. And never had any cpu-ram problems on it. But I'm using XMP profile (never thought it could be treated as overclocking though). There is bunch of specs screenshots in attachments.
Blender Bot added
Status
Archived
and removed
Status
Needs Information from User
labels 2024-06-17 16:57:41 +02:00
Author

Oops, I occasionaly closed this ticked, sorry!

Oops, I occasionaly closed this ticked, sorry!
Blender Bot added
Status
Needs Triage
and removed
Status
Archived
labels 2024-06-17 16:58:17 +02:00
Member

Does disabling XMP resolve the issue?

I'm using XMP profile (never thought it could be treated as overclocking though).

Intel considers XMP a overclock. And in my personal experience, enabling XMP can make a computer produce unexpected results related to memory if the speeds in the profile are too high, or the BIOS doesn't apply it properly.

Does disabling XMP resolve the issue? > I'm using XMP profile (never thought it could be treated as overclocking though). Intel considers XMP a overclock. And in my personal experience, enabling XMP can make a computer produce unexpected results related to memory if the speeds in the profile are too high, or the BIOS doesn't apply it properly.
Author

@Alaska It's really strange: I tried to record video of rendering crash with OBS, with XMP and without it. And it doesn't crash today! )) I will try to report you later, when the same crash will occur.

@Alaska It's really strange: I tried to record video of rendering crash with OBS, with XMP and without it. And it doesn't crash today! )) I will try to report you later, when the same crash will occur.
Author

I have encountered this error again, and there is video, explaining all things.
After all I disabled XMP in BIOS, and there weren't any crashes under any circumstances.
My apologizes, perhaps it's some kind of hardware/software related things.

P.S.
Nope. It crashed without XMP (and firefox )) right after 1003 frame.

I have encountered this error again, and [there is video, explaining all things](https://youtu.be/2KJ-O8JyF_I). After all I disabled XMP in BIOS, and there weren't any crashes under any circumstances. My apologizes, perhaps it's some kind of hardware/software related things. _P.S._ Nope. It crashed without XMP (and firefox )) right after 1003 frame.
Member

I tried recrating a scenario that could cause the issue based on the information you've shared. So that's a scene that:

  1. Consumes a lot of memory with many triangles and high resolution textures (consumes 23/24 GB of VRAM on my computer)
  2. Rendered on the GPU in Cycles.
  3. Has OIDN GPU denoising enabled.
  4. Has motion bur enabled with a moving and rotating camera.
  5. Has persistent data enabled.
  6. With rendering happening from the command line.

I also tried opening up Firefox (since the issue seemed more common with Firefox open for you) and watched a 4k YouTube video while the render was running.

I could not reproduce the issue. But there's still a possibilty that I'm doing something wrong.

@efimpetelin Can you run some extra tests for us?

  1. Does re-downloading and re-installing Blender 4.1.1 resolve the issue? There's a possibility that your Blender install is slightly corrupt.
  2. Does upadting to Blender 4.2 resolve the issue (4.2 is currently in Beta and can be downloaded from here: https://builder.blender.org/download/daily/ )
  3. Does running the Windows System File Checker then rebooting resolve the issue?
  4. When the error occurs, is it in the same spot? To check this you need to open the crash log and check the stack trace section and see if they're different. If you're not sure what you're looking for, feel free to upload a few crash logs your computer creates here and we can take a look at them.

System Information
Operating system: Windows-10-10.0.22631-SP0 64 Bits
Graphics card: NVIDIA GeForce RTX 4090/PCIe/SSE2 NVIDIA Corporation 4.6.0 NVIDIA 555.99
Blender version: 4.1.1, branch: blender-v4.1-release, commit date: 2024-04-15 15:11, hash: e1743a0317bc

I tried recrating a scenario that could cause the issue based on the information you've shared. So that's a scene that: 1. Consumes a lot of memory with many triangles and high resolution textures (consumes 23/24 GB of VRAM on my computer) 2. Rendered on the GPU in Cycles. 3. Has OIDN GPU denoising enabled. 4. Has motion bur enabled with a moving and rotating camera. 5. Has persistent data enabled. 6. With rendering happening from the command line. I also tried opening up Firefox (since the issue seemed more common with Firefox open for you) and watched a 4k YouTube video while the render was running. I could not reproduce the issue. But there's still a possibilty that I'm doing something wrong. @efimpetelin Can you run some extra tests for us? 1. Does re-downloading and re-installing Blender 4.1.1 resolve the issue? There's a possibility that your Blender install is slightly corrupt. 2. Does upadting to Blender 4.2 resolve the issue (4.2 is currently in Beta and can be downloaded from here: https://builder.blender.org/download/daily/ ) 3. Does running the Windows [System File Checker](https://support.microsoft.com/en-au/topic/use-the-system-file-checker-tool-to-repair-missing-or-corrupted-system-files-79aa86cb-ca52-166a-92a3-966e85d4094e) then rebooting resolve the issue? 4. When the error occurs, is it in the same spot? To check this you need to open the [crash log](https://docs.blender.org/manual/en/latest/troubleshooting/crash.html#crash-log) and check the stack trace section and see if they're different. If you're not sure what you're looking for, feel free to upload a few crash logs your computer creates here and we can take a look at them. --- **System Information** Operating system: Windows-10-10.0.22631-SP0 64 Bits Graphics card: NVIDIA GeForce RTX 4090/PCIe/SSE2 NVIDIA Corporation 4.6.0 NVIDIA 555.99 Blender version: 4.1.1, branch: blender-v4.1-release, commit date: 2024-04-15 15:11, hash: `e1743a0317bc`
Pratik Borhade added
Status
Needs Information from User
and removed
Status
Needs Triage
labels 2024-06-20 06:23:25 +02:00
Author

Hello!
I'm very busy these days so can't hassle with all this experiments for now. Neithertheless I will try as soon as some free time will appear.
The only thing that wasn't time-consuming was to test this scene under Ubuntu. And right after rendering was done, on the denoising phase Blender just says "Out of memory". It doesn't crash, just stops with this error.
My guess is still the same: peristent data, kept on GPU, prevents OIDN to allocate enough of VRAM to initialise proprely.

Hello! I'm very busy these days so can't hassle with all this experiments for now. Neithertheless I will try as soon as some free time will appear. The only thing that wasn't time-consuming was to test this scene under Ubuntu. And right after rendering was done, on the denoising phase Blender just says "Out of memory". It doesn't crash, just stops with this error. My guess is still the same: peristent data, kept on GPU, prevents OIDN to allocate enough of VRAM to initialise proprely.
Bart van der Braak added
Type
Bug
and removed
Type
Report
labels 2024-08-14 12:59:10 +02:00
Member

@efimpetelin hi, did you find time to run those tests?

@efimpetelin hi, did you find time to run those tests?
Author

@PratikPB2123 Hello! Nope, Sorry, I have no time for experimentations with my working rig. But I thought that my previous answer - #123145 (comment) - clears situation: Blender crashes under windows because of lack of vram. Possibly nvidia and/or windows shared memory confuses Blender that there are plenty of vram.

@PratikPB2123 Hello! Nope, Sorry, I have no time for experimentations with my working rig. But I thought that my previous answer - https://projects.blender.org/blender/blender/issues/123145#issuecomment-1227939 - clears situation: Blender crashes under windows because of lack of vram. Possibly nvidia and/or windows shared memory confuses Blender that there are plenty of vram.
Member

Hi, "out of memory" error in statistics when sufficient memory could be a different bug as original report complains about crash.

Hi, "out of memory" error in statistics when sufficient memory could be a different bug as original report complains about crash.
Member

#123145 (comment)

While GPU memory was 100% utilized as in image from above comment.
It would be helpful if you could run those checks suggested by Alaska.
AFIACS, we still do not have an example .blend file for investigation.

> https://projects.blender.org/blender/blender/issues/123145#issuecomment-1212882 While GPU memory was 100% utilized as in image from above comment. It would be helpful if you could run those checks suggested by Alaska. AFIACS, we still do not have an example .blend file for investigation.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Code Documentation
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Asset System
Module
Asset System
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#123145
No description provided.