Alternative Upload geometry data in parallel to multiple GPUs using the "Multi-Device" #107552

Open
William Leeson wants to merge 137 commits from leesonw/blender-cluster:upload_changed into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.
Member

Why

To improve the upload of data and BVH generation for GPUs especially in the case of having multiple GPUs. As currenly all uploads happen serially which can cause a slow down.

What

This patch is an alternative to #105403 to update device memory buffers using a parallel copy in the multi-device instead of inside a parallel_for in the device_update. To do this it passes a list of buffers to the MultiDevice to copy to the devices to reduce the number of parallel_for's. This method leaves the MultiDevice to implement the parallel division of work. Doing this results in:

(a) 1 parallel_for over devices for copy mesh and attribute data to the devices
(b) n parallel_for's over devices for each object BVH to be built
(c) 1 parallel_for over devices for the scene BVH to be built

A spreadsheet here shows the profile differences between this approach and the one in #105403 alongside the original Blender timings. Some of the results are shown in the graphs below. This patch is "parallel_copy" the other is "geom_update" in the graphs.

## Why To improve the upload of data and BVH generation for GPUs especially in the case of having multiple GPUs. As currenly all uploads happen serially which can cause a slow down. ## What This patch is an alternative to #105403 to update device memory buffers using a parallel copy in the multi-device instead of inside a parallel_for in the device_update. To do this it passes a list of buffers to the MultiDevice to copy to the devices to reduce the number of parallel_for's. This method leaves the MultiDevice to implement the parallel division of work. Doing this results in: (a) 1 parallel_for over devices for copy mesh and attribute data to the devices (b) n parallel_for's over devices for each object BVH to be built (c) 1 parallel_for over devices for the scene BVH to be built A spreadsheet [here](https://docs.google.com/spreadsheets/d/1ywQym3LTLgGIGMedExxsS4RpPKwpP60BjeLtXf1BaF4/edit?usp=sharing) shows the profile differences between this approach and the one in #105403 alongside the original Blender timings. Some of the results are shown in the graphs below. This patch is "parallel_copy" the other is "geom_update" in the graphs.
William Leeson added 108 commits 2023-05-02 17:43:25 +02:00
Removes mutex locks when updating pixel counts etc. Also, when
in background mode text status updates are removed.
Simplified the code to be more similar to the original by
implementing it as a spin lock. Then added a macro to be able to
switch back and forth between std::mutex and the SpinLock.
Add scoped event markers to highlight code regions of interest.
The scen BVH update was outside the parallel for this moves it
inside the parallel_for so it can be performed in parallel also.
The Windows build failed as the timers were using
a dynamic array. This has been changed to use an
heap array which is allocated with the scene
instead.
The stats parameters are now contained in the Scene class and
no longer need to be passed as a parameter.
To simplify the code and make it easier to read this moves the
scene BVH upload into the deviceDataXferAndBVHUpdate method.
Merge branch 'upstream_main' into geometry_update
Some checks failed
buildbot/vexp-code-patch-coordinator Build done.
ef84fe9e5d
1. Switch to use auto for DeviceScene
2. use parallel_for instead of tbb::parallel_for
3. rename n_scenes num_scenes
The parameters for host_mem_alloc were changed to
`host_mem_alloc(size_t size, int aligment) from
host_mem_alloc(size_t size, int aligment, void **p_mem)
Ensures that only 1 BVH2 BVH is built for each object or scene.
This is done using lock to aquire the BVH2 and other threads
can skip the building if it is for an object. However, for the
top level BVH all threads wait on the TLAS to be finished.
Basically checks if the BVH2 BVH has been built or is being built.
Then when building the top level BVH2 BVH all devices that use it
wait for the one that is building it to finish before continuing.
The mult-device set the device pointer for the device_memory as it
iterated through the devices. However, it did not restore the
original pointer to the device if the device_ptr (pointer to the
memory buffer) was 0 which it always is after a mem_free.
Optix device was using the multi-bvh as the Optix BVH which was
causing the device to have the wrong handles for the objects.
Don't do progress updates when in headless mode. Previously
the background flag was used but this is also used for F12 render.
Which needs the updates.
Mutex needs to be locked before using the condition variable can
be used. This adds a scoped lock to ensure it is locked before
waiting on the condition_variable.
The top level BVH can only be built once all the other (bottom
level) BVHs are completed.
This breaks up the geometry.cpp and geometry_additions.cpp into
the following files:
- geometry.cpp
- geometry_mesh.cpp
- geometry_attributes.cpp
- geometry_bvh.cpp
so as to organise the code into more manageable pieces.
Needed to add the memory copy methods the have offset and size.
Previously after every device was added the CPU device was swopped
to the end of the list. Now the position it was added at is saved
and it is swopped with the last device at the end.
Forgot to push the code which skips swopping if the is no CPU in
the MultiDevice set.
Pinned memory was used for all device_memory. However, due to
that it may cause memory pressure this was removed and can be
enabled by passing USE_DEVICE_PINNED_MEMORY.
Arrays of dougle values are replaced by a vector<> of a struct
scene_times.
1. remove camelCase method names.
2. fix up white space changes.
3. remove some unnecessary changes.
Moved GeometrySizes and AttributeSizes to be stored in scene. This
resulted in them being removed as parameters where a Scene was
already passed in.
Refactored the GeometryManager methods so that those relating to
the DeviceScene are now methods in the DeviceScene.
As the device_update_bvh2 method just involves transferring data
to the device it was moved to DeviceScene.
Moving DeviceScene from scene.h caused the device definitions to be
no longer present in various files. This add them back in by
including device.h in the required files.
Forgot to add the file for the previous change to the memory copy
methods, which resulted in the OneAPI build failing.
Uses the unique_ptr to decallocate the BVH2 structure in the
BVH postprocess and assignes the bvh root indices in the device
specific areas also.
Some CUDA commands were used before the context was setup. Also
since the BVHs are build in order there is not need for the lock
to prevent 2 being build at tehe same time to save memory.
For the displacement and shadow texture generation the textures
were originally only uploaded to 1 device. However, this resulted
in some textures that were needed across all devices not being
uploaded. This fixes that by uploading them to all devices.
To reduce the amount of code and manage the life cycles of the
DeviceScenes they are now stored as unique_ptrs
Re-instates the task pool used to build the BVHs for the objects.
Also the displacement timing was being reported twice and this is
fixed.
Move CUDA context into a smaller scope
Some checks failed
buildbot/vexp-code-patch-coordinator Build done.
429f953d6c
Merge branch 'upstream_main' into geometry_update
All checks were successful
buildbot/vexp-code-patch-coordinator Build done.
de58c2ab8e
William Leeson requested review from Brecht Van Lommel 2023-05-02 17:43:51 +02:00
William Leeson changed title from WIP:Upload geometry data in parallel to multiple GPUs using the "Multi-Device" to WIP:Alternative Upload geometry data in parallel to multiple GPUs using the "Multi-Device" 2023-05-05 16:19:37 +02:00
William Leeson added 20 commits 2023-05-09 11:15:56 +02:00
This was incorrectly set to be device_size which is not set until
it is allocated on the device.
If the device was the MultiDevice it previously returned the first
BVH. This was incorrect it should return the MultiBVH. Also, if
it cannot find the device then it should return NULL.
This code was duplicated in MultiDevice and geometry_bvh.cpp so
this removed that duplication.
The parallel_for for is replaced by many parallel_for's one for
the upload, one for each Object BVH and one for the scene BVH.
If the device was the MultiDevice it previously returned the first
BVH. This was incorrect it should return the MultiBVH. Also, if
it cannot find the device then it should return NULL.
The scen bvh layout was set to NONE which disabled the local
intersection testing used for he bevel effect. This removes that
line and also refactors the code to make it cleaner.
Merge branch 'remove_parallel_for' into upload_changed
All checks were successful
buildbot/vexp-code-patch-coordinator Build done.
f80ef452a1
William Leeson changed title from WIP:Alternative Upload geometry data in parallel to multiple GPUs using the "Multi-Device" to Alternative Upload geometry data in parallel to multiple GPUs using the "Multi-Device" 2023-05-09 11:35:02 +02:00
Author
Member

@blender-bot package

@blender-bot package
William Leeson closed this pull request 2023-05-09 11:41:49 +02:00
Member

Package build started. Download here when ready.

Package build started. [Download here](https://builder.blender.org/download/patch/PR107552) when ready.
William Leeson reopened this pull request 2023-05-09 11:43:22 +02:00
William Leeson added 1 commit 2023-05-09 11:59:03 +02:00
William Leeson added 1 commit 2023-05-10 15:57:08 +02:00
William Leeson added 1 commit 2023-05-12 10:45:33 +02:00
William Leeson added 1 commit 2023-05-15 16:40:45 +02:00
William Leeson added 1 commit 2023-05-16 10:34:46 +02:00
William Leeson added 1 commit 2023-05-17 11:00:17 +02:00
William Leeson added 1 commit 2023-05-19 09:15:52 +02:00
William Leeson added 1 commit 2023-05-22 11:12:59 +02:00
William Leeson added 1 commit 2023-05-29 10:43:26 +02:00
This pull request has changes conflicting with the target branch.
  • intern/cycles/device/device.h
  • intern/cycles/device/metal/device_impl.mm
  • intern/cycles/device/metal/queue.mm
  • intern/cycles/device/multi/device.cpp
  • intern/cycles/device/oneapi/device_impl.cpp
  • intern/cycles/device/oneapi/device_impl.h
  • intern/cycles/scene/geometry.cpp
  • intern/cycles/scene/geometry_attributes.cpp
  • intern/cycles/scene/geometry_bvh.cpp
  • intern/cycles/scene/geometry_mesh.cpp

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u upload_changed:leesonw-upload_changed
git checkout leesonw-upload_changed
Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#107552
No description provided.