Alternative Upload geometry data in parallel to multiple GPUs using the "Multi-Device" #107552

Open
William Leeson wants to merge 137 commits from leesonw/blender-cluster:upload_changed into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.
Member

Why

To improve the upload of data and BVH generation for GPUs especially in the case of having multiple GPUs. As currenly all uploads happen serially which can cause a slow down.

What

This patch is an alternative to #105403 to update device memory buffers using a parallel copy in the multi-device instead of inside a parallel_for in the device_update. To do this it passes a list of buffers to the MultiDevice to copy to the devices to reduce the number of parallel_for's. This method leaves the MultiDevice to implement the parallel division of work. Doing this results in:

(a) 1 parallel_for over devices for copy mesh and attribute data to the devices
(b) n parallel_for's over devices for each object BVH to be built
(c) 1 parallel_for over devices for the scene BVH to be built

A spreadsheet here shows the profile differences between this approach and the one in #105403 alongside the original Blender timings. Some of the results are shown in the graphs below. This patch is "parallel_copy" the other is "geom_update" in the graphs.

## Why To improve the upload of data and BVH generation for GPUs especially in the case of having multiple GPUs. As currenly all uploads happen serially which can cause a slow down. ## What This patch is an alternative to #105403 to update device memory buffers using a parallel copy in the multi-device instead of inside a parallel_for in the device_update. To do this it passes a list of buffers to the MultiDevice to copy to the devices to reduce the number of parallel_for's. This method leaves the MultiDevice to implement the parallel division of work. Doing this results in: (a) 1 parallel_for over devices for copy mesh and attribute data to the devices (b) n parallel_for's over devices for each object BVH to be built (c) 1 parallel_for over devices for the scene BVH to be built A spreadsheet [here](https://docs.google.com/spreadsheets/d/1ywQym3LTLgGIGMedExxsS4RpPKwpP60BjeLtXf1BaF4/edit?usp=sharing) shows the profile differences between this approach and the one in #105403 alongside the original Blender timings. Some of the results are shown in the graphs below. This patch is "parallel_copy" the other is "geom_update" in the graphs.
William Leeson added 108 commits 2023-05-02 17:43:25 +02:00
598c7c151d Remove some locks from progress update
Removes mutex locks when updating pixel counts etc. Also, when
in background mode text status updates are removed.
bbce8a0aae Add macro to switch from std::mutex to SpinLock
Simplified the code to be more similar to the original by
implementing it as a spin lock. Then added a macro to be able to
switch back and forth between std::mutex and the SpinLock.
72b918a9e2 Adds scoped event markers
Add scoped event markers to highlight code regions of interest.
2183d70e9d Move scene BVH update into parallel_for
The scen BVH update was outside the parallel for this moves it
inside the parallel_for so it can be performed in parallel also.
5c56f518eb FIX: Preallocate timer arrays in scene
The Windows build failed as the timers were using
a dynamic array. This has been changed to use an
heap array which is allocated with the scene
instead.
cba4d732e2 Remove stats params from deviceDataXferAndBVHUpdate
The stats parameters are now contained in the Scene class and
no longer need to be passed as a parameter.
665a0e84b2 Move scne BVH upload into deviceDataXferAndBVHUpdate
To simplify the code and make it easier to read this moves the
scene BVH upload into the deviceDataXferAndBVHUpdate method.
buildbot/vexp-code-patch-coordinator Build done. Details
ef84fe9e5d
Merge branch 'upstream_main' into geometry_update
7bf80f2b77 Clean up code and simplify
1. Switch to use auto for DeviceScene
2. use parallel_for instead of tbb::parallel_for
3. rename n_scenes num_scenes
01e96490f1 Change params for host_mem_alloc to host_mem_alloc(size_t size, int aligment)
The parameters for host_mem_alloc were changed to
`host_mem_alloc(size_t size, int aligment) from
host_mem_alloc(size_t size, int aligment, void **p_mem)
4e1469b8bb Only build the BVH2 BVH once
Ensures that only 1 BVH2 BVH is built for each object or scene.
This is done using lock to aquire the BVH2 and other threads
can skip the building if it is for an object. However, for the
top level BVH all threads wait on the TLAS to be finished.
6e0875b729 Only build a BVH2 BVH once
Basically checks if the BVH2 BVH has been built or is being built.
Then when building the top level BVH2 BVH all devices that use it
wait for the one that is building it to finish before continuing.
c952e5d159 FIX: Fix memory leak when using more than 1 device
The mult-device set the device pointer for the device_memory as it
iterated through the devices. However, it did not restore the
original pointer to the device if the device_ptr (pointer to the
memory buffer) was 0 which it always is after a mem_free.
3a566ccee4 FIX: Using the Optix BVH for the NVidia device
Optix device was using the multi-bvh as the Optix BVH which was
causing the device to have the wrong handles for the objects.
b0345701a0 FIX:Use the headless flag to switch on/off progress updates
Don't do progress updates when in headless mode. Previously
the background flag was used but this is also used for F12 render.
Which needs the updates.
d59a30b4a0 FIX: Lock mutex before using condition variable waiting on BVH build
Mutex needs to be locked before using the condition variable can
be used. This adds a scoped lock to ensure it is locked before
waiting on the condition_variable.
b75f6ad788 FIX: Make sure top level BVH is built after all other BVHs
The top level BVH can only be built once all the other (bottom
level) BVHs are completed.
a27247e28f Refactor geometry.cpp and geometry_additions.cpp
This breaks up the geometry.cpp and geometry_additions.cpp into
the following files:
- geometry.cpp
- geometry_mesh.cpp
- geometry_attributes.cpp
- geometry_bvh.cpp
so as to organise the code into more manageable pieces.
9b3b180d23 FIX: Enable OneAPI build and fix compile issues
Needed to add the memory copy methods the have offset and size.
ac67371fa4 Makes sure the CPU is the last device just once at the end.
Previously after every device was added the CPU device was swopped
to the end of the list. Now the position it was added at is saved
and it is swopped with the last device at the end.
674721194e FIX: Adds case where no CPU is selected in MultiDevice
Forgot to push the code which skips swopping if the is no CPU in
the MultiDevice set.
3c81e479a6 Switch off using pinned memory on CUDA devices
Pinned memory was used for all device_memory. However, due to
that it may cause memory pressure this was removed and can be
enabled by passing USE_DEVICE_PINNED_MEMORY.
cddd2bfdf0 Replae upload and building times arrays with a vector struct
Arrays of dougle values are replaced by a vector<> of a struct
scene_times.
66a6a7a0af Clean up code
1. remove camelCase method names.
2. fix up white space changes.
3. remove some unnecessary changes.
bbbf76c4db GeometrySizes and AttributeSizes are stored in Scene
Moved GeometrySizes and AttributeSizes to be stored in scene. This
resulted in them being removed as parameters where a Scene was
already passed in.
b1dd204d42 Moves DeviceScene related methods to the DeviceScene class
Refactored the GeometryManager methods so that those relating to
the DeviceScene are now methods in the DeviceScene.
f63c508b40 Move BVH2 device update to DeviceScene
As the device_update_bvh2 method just involves transferring data
to the device it was moved to DeviceScene.
d7cd4a4951 FIX: Add device type definitions
Moving DeviceScene from scene.h caused the device definitions to be
no longer present in various files. This add them back in by
including device.h in the required files.
b407dba398 FIX: Use new memory copy method that replaced the old one
Forgot to add the file for the previous change to the memory copy
methods, which resulted in the OneAPI build failing.
d96fd9b08f Handle the bvh root index correctly for the device specific DeviceScene
Uses the unique_ptr to decallocate the BVH2 structure in the
BVH postprocess and assignes the bvh root indices in the device
specific areas also.
1560ec66c9 FIX: Create CUDA context earlier and remove lock
Some CUDA commands were used before the context was setup. Also
since the BVHs are build in order there is not need for the lock
to prevent 2 being build at tehe same time to save memory.
58cddaeefe FIX: Upload textures to all devices
For the displacement and shadow texture generation the textures
were originally only uploaded to 1 device. However, this resulted
in some textures that were needed across all devices not being
uploaded. This fixes that by uploading them to all devices.
a916e34033 Move scene DeviceScene to use unique_ptr
To reduce the amount of code and manage the life cycles of the
DeviceScenes they are now stored as unique_ptrs
df62806c94 Use task pool for object BVH build also fixes stats recording
Re-instates the task pool used to build the BVHs for the objects.
Also the displacement timing was being reported twice and this is
fixed.
buildbot/vexp-code-patch-coordinator Build done. Details
429f953d6c
Move CUDA context into a smaller scope
buildbot/vexp-code-patch-coordinator Build done. Details
de58c2ab8e
Merge branch 'upstream_main' into geometry_update
William Leeson requested review from Brecht Van Lommel 2023-05-02 17:43:51 +02:00
William Leeson changed title from WIP:Upload geometry data in parallel to multiple GPUs using the "Multi-Device" to WIP:Alternative Upload geometry data in parallel to multiple GPUs using the "Multi-Device" 2023-05-05 16:19:37 +02:00
William Leeson added 20 commits 2023-05-09 11:15:56 +02:00
df0fba0d7d FIX: Upload buffer if data_size is not zero
This was incorrectly set to be device_size which is not set until
it is allocated on the device.
e04384f20d FIX: get_device_bvh returns the MultiBVH if the device is the MultiDevice
If the device was the MultiDevice it previously returned the first
BVH. This was incorrect it should return the MultiBVH. Also, if
it cannot find the device then it should return NULL.
b1be09d449 Move the BVH layout determination into a method for reuse
This code was duplicated in MultiDevice and geometry_bvh.cpp so
this removed that duplication.
ab21849d86 Remove the parallel_for over devices
The parallel_for for is replaced by many parallel_for's one for
the upload, one for each Object BVH and one for the scene BVH.
565bc75769 FIX: get_device_bvh returns the MultiBVH if the device is the MultiDevice
If the device was the MultiDevice it previously returned the first
BVH. This was incorrect it should return the MultiBVH. Also, if
it cannot find the device then it should return NULL.
b7a80ba6ef FIX: Enable local intersection testing on BVH
The scen bvh layout was set to NONE which disabled the local
intersection testing used for he bevel effect. This removes that
line and also refactors the code to make it cleaner.
buildbot/vexp-code-patch-coordinator Build done. Details
f80ef452a1
Merge branch 'remove_parallel_for' into upload_changed
William Leeson changed title from WIP:Alternative Upload geometry data in parallel to multiple GPUs using the "Multi-Device" to Alternative Upload geometry data in parallel to multiple GPUs using the "Multi-Device" 2023-05-09 11:35:02 +02:00
Author
Member

@blender-bot package

@blender-bot package
William Leeson closed this pull request 2023-05-09 11:41:49 +02:00
Member

Package build started. Download here when ready.

Package build started. [Download here](https://builder.blender.org/download/patch/PR107552) when ready.
William Leeson reopened this pull request 2023-05-09 11:43:22 +02:00
William Leeson added 1 commit 2023-05-09 11:59:03 +02:00
William Leeson added 1 commit 2023-05-10 15:57:08 +02:00
William Leeson added 1 commit 2023-05-12 10:45:33 +02:00
William Leeson added 1 commit 2023-05-15 16:40:45 +02:00
William Leeson added 1 commit 2023-05-16 10:34:46 +02:00
William Leeson added 1 commit 2023-05-17 11:00:17 +02:00
William Leeson added 1 commit 2023-05-19 09:15:52 +02:00
William Leeson added 1 commit 2023-05-22 11:12:59 +02:00
William Leeson added 1 commit 2023-05-29 10:43:26 +02:00
This pull request has changes conflicting with the target branch.
  • intern/cycles/device/device.h
  • intern/cycles/device/metal/device_impl.mm
  • intern/cycles/device/metal/queue.mm
  • intern/cycles/device/multi/device.cpp
  • intern/cycles/scene/geometry.cpp
  • intern/cycles/scene/geometry_attributes.cpp
  • intern/cycles/scene/geometry_bvh.cpp
  • intern/cycles/scene/geometry_mesh.cpp
  • intern/cycles/scene/scene.cpp

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u upload_changed:leesonw-upload_changed
git checkout leesonw-upload_changed
Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#107552
No description provided.