Refactor geometry update for better parallelization and upload performance #105403

Open
William Leeson wants to merge 144 commits from leesonw/blender-cluster:geometry_update into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.
Member

Why

Cycle's geometry upload and preparation can be refactored to be easier to understand and also to improve its efficiency. By creating a single parallel_for for each device instead of one thread per copy operation the overhead of handling threading can be reduced thereby improving performance. Also, by splitting the host side work from the device side work it is easier to distribute the workload. Some calculations are performed twice or unnecessarily as they only needed for specific tasks but are performed regardless as this is hard to work out with the current code. This also allows for further potential optimizations in the future.

What

  • Break the geometry update into host and device update.
  • Re-order the update so that it is easier to understand
  • Create a parallel loop over each device and from there upload the data.
    • Create a DeviceScene per device instead of using the MultiDevice
    • Create a BVH per device instead of using the MultiDevice
  • Only perform calculations that are required
    This results in the program flow looking like the following
    This method works as follows:
    ----HOST SIDE CALCULATIONS AND SETUP------
  1. Calculate vertex normals and gather undisplaced vertices
  2. Tessellate mesh if needed
  3. Compute object bounds
  4. fill/pack attribute buffers
  5. fill/pack geometry buffers
    --- only run on the first device ---
  6. displace meshes and calculate shadow transparency if needed
  7. if displacement or shadow transparency occurred then refill buffers
    ----DEVICE SPECIFIC BEGIN----
  8. copy attribute buffers to device
  9. copy geometry buffers to device
  10. build object BVHs
  11. build scene BVH
    ----DEVICE SPECIFIC END----
    ----HOST SIDE BELOW THIS POINT
  12. Remove modified or update tags

The original flow and the new flow are shown in the files below. To achieve this a DeviceScene is created for each device. However, as this is just updating the geometry device update (and not all uses of the DeviceScene) the old DeviceScene needs to remain so other that buffers can be handled by the multi-device. The host side pointers are shared by all the DeviceScenes so they only need to be packed once. This is done using a method to update the host pointer of one DeviceScene from an existing DeviceScene.

I have created a spreedsheet here comparing this patch versus Blender main. Also and chart of the results so far is available in the images below. I have also created another spreadsheet here comparing it with both the original Blender implementation and the alternative method #107552. This patch is "geom_update" the other is "parallel_copy" in the graphs.

## Why Cycle's geometry upload and preparation can be refactored to be easier to understand and also to improve its efficiency. By creating a single parallel_for for each device instead of one thread per copy operation the overhead of handling threading can be reduced thereby improving performance. Also, by splitting the host side work from the device side work it is easier to distribute the workload. Some calculations are performed twice or unnecessarily as they only needed for specific tasks but are performed regardless as this is hard to work out with the current code. This also allows for further potential optimizations in the future. ## What * Break the geometry update into host and device update. * Re-order the update so that it is easier to understand * Create a parallel loop over each device and from there upload the data. * Create a DeviceScene per device instead of using the MultiDevice * Create a BVH per device instead of using the MultiDevice * Only perform calculations that are required This results in the program flow looking like the following This method works as follows: ----HOST SIDE CALCULATIONS AND SETUP------ 1. Calculate vertex normals and gather undisplaced vertices 2. Tessellate mesh if needed 3. Compute object bounds 4. fill/pack attribute buffers 5. fill/pack geometry buffers --- only run on the first device --- 6. displace meshes and calculate shadow transparency if needed 7. if displacement or shadow transparency occurred then refill buffers ----DEVICE SPECIFIC BEGIN---- 8. copy attribute buffers to device 9. copy geometry buffers to device 10. build object BVHs 11. build scene BVH ----DEVICE SPECIFIC END---- ----HOST SIDE BELOW THIS POINT 12. Remove modified or update tags The original flow and the new flow are shown in the files below. To achieve this a DeviceScene is created for each device. However, as this is just updating the geometry device update (and not all uses of the DeviceScene) the old DeviceScene needs to remain so other that buffers can be handled by the multi-device. The host side pointers are shared by all the DeviceScenes so they only need to be packed once. This is done using a method to update the host pointer of one DeviceScene from an existing DeviceScene. I have created a spreedsheet [here](https://docs.google.com/spreadsheets/d/1OYF3g988jZQXQG1pVopzIMQTRVo39V1znrLx8c92Rhs/edit?usp=sharing) comparing this patch versus Blender main. Also and chart of the results so far is available in the images below. I have also created another spreadsheet [here](https://docs.google.com/spreadsheets/d/1ywQym3LTLgGIGMedExxsS4RpPKwpP60BjeLtXf1BaF4/edit?usp=sharing) comparing it with both the original Blender implementation and the alternative method #107552. This patch is "geom_update" the other is "parallel_copy" in the graphs.
William Leeson added 7 commits 2023-03-03 15:06:12 +01:00
Removes mutex locks when updating pixel counts etc. Also, when
in background mode text status updates are removed.
Simplified the code to be more similar to the original by
implementing it as a spin lock. Then added a macro to be able to
switch back and forth between std::mutex and the SpinLock.
Add scoped event markers to highlight code regions of interest.
William Leeson added 2 commits 2023-03-08 14:31:11 +01:00
William Leeson added this to the Render & Cycles project 2023-03-09 18:00:57 +01:00
William Leeson added 1 commit 2023-03-10 12:29:47 +01:00
William Leeson added 1 commit 2023-03-14 10:13:35 +01:00
William Leeson added 1 commit 2023-03-14 12:01:34 +01:00
The scen BVH update was outside the parallel for this moves it
inside the parallel_for so it can be performed in parallel also.
William Leeson added 1 commit 2023-03-14 12:21:31 +01:00
The Windows build failed as the timers were using
a dynamic array. This has been changed to use an
heap array which is allocated with the scene
instead.
William Leeson added 1 commit 2023-03-14 12:58:33 +01:00
The stats parameters are now contained in the Scene class and
no longer need to be passed as a parameter.
Hans Goudey changed title from WIP:Refactor geometry update for better parallelization and upload performance to WIP: Refactor geometry update for better parallelization and upload performance 2023-03-14 13:27:16 +01:00
William Leeson added 1 commit 2023-03-14 13:32:05 +01:00
To simplify the code and make it easier to read this moves the
scene BVH upload into the deviceDataXferAndBVHUpdate method.
William Leeson added 1 commit 2023-03-14 15:07:05 +01:00
William Leeson requested review from Brecht Van Lommel 2023-03-14 15:33:33 +01:00
Author
Member

@brecht Ok I think this is close to ready. It would be good to get an overall impression even if not a complete review when you have time. I'll hopefully have performance data sometime this week.

@brecht Ok I think this is close to ready. It would be good to get an overall impression even if not a complete review when you have time. I'll hopefully have performance data sometime this week.
William Leeson added 2 commits 2023-03-15 16:19:56 +01:00
William Leeson changed title from WIP: Refactor geometry update for better parallelization and upload performance to Refactor geometry update for better parallelization and upload performance 2023-03-16 08:23:39 +01:00
William Leeson added 1 commit 2023-03-16 09:37:19 +01:00

I see a bunch of commented code and benchmarking utilities in business logic.

I see a bunch of commented code and benchmarking utilities in business logic.
William Leeson added 1 commit 2023-03-16 15:56:25 +01:00
William Leeson added 1 commit 2023-03-16 15:59:47 +01:00
William Leeson changed title from Refactor geometry update for better parallelization and upload performance to WIP: Refactor geometry update for better parallelization and upload performance 2023-03-16 16:00:03 +01:00

I'm not going to have time this week, but will look next week.

I'm not going to have time this week, but will look next week.
William Leeson added 1 commit 2023-03-17 12:51:29 +01:00
William Leeson added 1 commit 2023-03-17 16:02:01 +01:00
William Leeson added 1 commit 2023-03-20 12:41:55 +01:00
Merge branch 'upstream_main' into geometry_update
Some checks failed
buildbot/vexp-code-patch-coordinator Build done.
ef84fe9e5d

@blender-bot build

@blender-bot build

This part I'm not sure about.

Create a BVH per device instead of using the MultiDevice

Right now we can build the BVH once for multi-device rendering on CUDA, HIP, Metal, OneAPI. With native hardware ray-tracing this may become impossible, but that usually comes with GPU accelerated tree building so it's not so bad.

This performance regression I would rather avoid.

This part I'm not sure about. > Create a BVH per device instead of using the MultiDevice Right now we can build the BVH once for multi-device rendering on CUDA, HIP, Metal, OneAPI. With native hardware ray-tracing this may become impossible, but that usually comes with GPU accelerated tree building so it's not so bad. This performance regression I would rather avoid.

See the buildbot log for build errors, build warnings and test failures to be fixed.

See the buildbot log for build errors, build warnings and test failures to be fixed.
Brecht Van Lommel requested changes 2023-03-20 19:28:34 +01:00
Brecht Van Lommel left a comment
Owner

I did not go through every line of code in geometry.cpp and geometry_additions.cpp, but overall:

  • Try to find some way to avoid counting sizes in a separate struct?
  • Splitting the code into multiple files seems good to get this organized a bit. I'd suggest to have geometry_attributes.cpp, geometry_bvh.cpp, geometry_mesh.cpp, .. or something along those lines. And then each of those can handle as much as possible related to that part of the data, with consistent function names for the various stages.
I did not go through every line of code in `geometry.cpp` and `geometry_additions.cpp`, but overall: * Try to find some way to avoid counting sizes in a separate struct? * Splitting the code into multiple files seems good to get this organized a bit. I'd suggest to have `geometry_attributes.cpp`, `geometry_bvh.cpp`, `geometry_mesh.cpp`, .. or something along those lines. And then each of those can handle as much as possible related to that part of the data, with consistent function names for the various stages.
@ -887,7 +887,6 @@ typedef enum {
typedef struct _nvrtcProgram* nvrtcProgram;

Unrelated whitespace change.

Unrelated whitespace change.
brecht marked this conversation as resolved
@ -681,7 +682,6 @@ static int cuewNvrtcInit(void) {
return result;
}

Unrelated whitespace change.

Unrelated whitespace change.
brecht marked this conversation as resolved
@ -68,6 +68,7 @@ typedef void* DynamicLibrary;
static DynamicLibrary cuda_lib;
static DynamicLibrary nvrtc_lib;
static DynamicLibrary nvtx_lib;

This probably comes from another patch and should not be part of this?

This probably comes from another patch and should not be part of this?
brecht marked this conversation as resolved
@ -125,6 +125,7 @@ void BlenderSession::create_session()
session = new Session(session_params, scene_params);
session->progress.set_update_callback(function_bind(&BlenderSession::tag_redraw, this));
session->progress.set_cancel_callback(function_bind(&BlenderSession::test_cancel, this));
session->progress.set_updates(!background);

This seems like a change to review as a separate PR?

This seems like a change to review as a separate PR?
brecht marked this conversation as resolved
@ -14,1 +13,4 @@
const Device *device_)
: BVH(params_, geometry_, objects_), device(device_)
{
// Resize the sub-bvh container to match the number of devices

Maybe there can be some abstraction at this level to avoid building the BVH multiple times when not needed?

Maybe there can be some abstraction at this level to avoid building the BVH multiple times when not needed?
leesonw marked this conversation as resolved
@ -149,6 +149,15 @@ void CPUDevice::mem_copy_to(device_memory &mem)
}
}
void CPUDevice::mem_copy_to(device_memory &mem, size_t, size_t offset)

Silence unused argument warnings with , size_t, size_t)

Silence unused argument warnings with `, size_t, size_t)`
brecht marked this conversation as resolved
@ -31,0 +30,4 @@
int flags = CUEW_INIT_CUDA;
int cuew_result = cuewInit(flags);

This change does nothing (anymore?).

This change does nothing (anymore?).
leesonw marked this conversation as resolved
@ -548,3 +548,3 @@
const CUDAContextScope scope(this);
cuda_assert(cuMemcpyHtoD((CUdeviceptr)device_pointer, host_pointer, size));
cuda_assert(cuMemcpyHtoD((CUdeviceptr)device_pointer + offset,

Checking the definition of CUdeviceptr this seems to work, but it's not obvious by just looking at the code or for all devices types. Maybe better to use reinterpret_cast<unsigned char *>( in all device code, to make it clear?

Checking the definition of `CUdeviceptr` this seems to work, but it's not obvious by just looking at the code or for all devices types. Maybe better to use `reinterpret_cast<unsigned char *>(` in all device code, to make it clear?
brecht marked this conversation as resolved
@ -551,0 +554,4 @@
void CUDADevice::host_mem_alloc(size_t size, int aligment, void **p_mem) {
CUDAContextScope scope(this);
CUresult result = cuMemAllocHost(p_mem, size);

The CUDA docs for cuMemAllocHost say:

Allocating excessive amounts of memory with cuMemAllocHost() may degrade system performance, since it reduces the amount of memory available to the system for paging. As a result, this function is best used sparingly to allocate staging areas for data exchange between host and device.

It's not clear to me that we should be using this for allocating the entire scene representation on the CPU? Seems like that could lead to Blender/Cycles running out of memory faster than before.

The CUDA docs for `cuMemAllocHost` say: ``` Allocating excessive amounts of memory with cuMemAllocHost() may degrade system performance, since it reduces the amount of memory available to the system for paging. As a result, this function is best used sparingly to allocate staging areas for data exchange between host and device. ``` It's not clear to me that we should be using this for allocating the entire scene representation on the CPU? Seems like that could lead to Blender/Cycles running out of memory faster than before.
brecht marked this conversation as resolved
@ -551,0 +555,4 @@
void CUDADevice::host_mem_alloc(size_t size, int aligment, void **p_mem) {
CUDAContextScope scope(this);
CUresult result = cuMemAllocHost(p_mem, size);
if(result != CUDA_SUCCESS) {

This should not have custom error handling. We have cuda_assert for error reporting, so that an error is set in the device and reported to the user.

This should not have custom error handling. We have `cuda_assert` for error reporting, so that an error is set in the device and reported to the user.
brecht marked this conversation as resolved
@ -566,8 +584,13 @@ void CUDADevice::mem_alloc(device_memory &mem)
void CUDADevice::mem_copy_to(device_memory &mem)

Can this code be deduplicated by calling mem_copy_to(mem, mem.memory_size(), 0)?

Can this code be deduplicated by calling `mem_copy_to(mem, mem.memory_size(), 0)`?
brecht marked this conversation as resolved
@ -568,3 +586,2 @@
if (mem.type == MEM_GLOBAL) {
global_free(mem);
global_alloc(mem);
if ((mem.device_size < mem.memory_size()) || (!mem.device_pointer)) {

This means memory allocations never decrease in size? Shouldn't it reallocate whenever the size changes?

This means memory allocations never decrease in size? Shouldn't it reallocate whenever the size changes?
brecht marked this conversation as resolved
@ -584,0 +607,4 @@
void CUDADevice::mem_copy_to(device_memory &mem, size_t size, size_t offset)
{
if (mem.type == MEM_GLOBAL) {
if ((mem.device_size < mem.memory_size()) || (!mem.device_pointer)) {

Same comment as above.

Same comment as above.
brecht marked this conversation as resolved
@ -766,0 +774,4 @@
}
}
void GPUDevice::generic_copy_to(device_memory &mem, size_t size, size_t offset)

Similar comment about code deduplication as above.

Similar comment about code deduplication as above.
brecht marked this conversation as resolved
@ -8,3 +8,3 @@
#include "bvh/params.h"
#include "scene/scene.h"

The intent for the device API was to not be aware of Scene or DeviceScene, though that is already broken somewhat.

Ideally we'd avoid it entirely, but here at least it should include but use a class DeviceScene; forward declaration.

The intent for the device API was to not be aware of `Scene` or `DeviceScene`, though that is already broken somewhat. Ideally we'd avoid it entirely, but here at least it should include but use a `class DeviceScene;` forward declaration.
leesonw marked this conversation as resolved
@ -217,2 +222,4 @@
return 0;
}
virtual device_ptr find_matching_mem(device_ptr key, Device * /*sub*/)

Add a comment explaining what this does.

Add a comment explaining what this does.
leesonw marked this conversation as resolved
@ -290,8 +300,11 @@ class Device {
friend class DeviceServer;
friend class device_memory;
virtual void host_mem_alloc(size_t size, int alignment, void **p_mem);

This can return void * instead of using void **p_mem.

This can return `void *` instead of using `void **p_mem`.
leesonw marked this conversation as resolved
@ -369,17 +379,78 @@ template<typename T> class device_vector : public device_memory {
assert(data_elements > 0);
}
device_vector(Device *device,

TODO: what is this constructor for?

TODO: what is this constructor for?
leesonw marked this conversation as resolved
@ -41,3 +45,3 @@
/* Always add CPU devices at the back since GPU devices can change
* host memory pointers, which CPU uses as device pointer. */
SubDevice *sub;
SubDevice *sub = new SubDevice;

Use unique_ptr and make_unique instead of new in new code.

Use `unique_ptr` and `make_unique` instead of `new` in new code.
leesonw marked this conversation as resolved
@ -49,2 +52,2 @@
devices.emplace_front();
sub = &devices.front();
devices.emplace_back(sub);
// find first CPU device and swop it with the new device

swop -> swap

swop -> swap
leesonw marked this conversation as resolved
@ -51,0 +54,4 @@
// to keep the CPU devices at the end of the vector.
int last = devices.size() - 1;
int o = last;
while ((o > 0) && (devices[o - 1]->device->info.type == DEVICE_CPU)) {

If we ensure the CPU is always the last device in the vector, there is no need to have a while loop searching for it.

If we ensure the CPU is always the last device in the vector, there is no need to have a while loop searching for it.
leesonw marked this conversation as resolved
@ -1041,2 +637,2 @@
}
}
/* copy svm attributes to device */
dscene->attributes_map.copy_to_device_if_modified();

Looking at code like this, it feels like maybe DeviceScene should somehow have a list of the device_memory it contains. And then perhaps that could be used to have a function to transfer all modified memory, rather than manually doing it here.

Looking at code like this, it feels like maybe `DeviceScene` should somehow have a list of the `device_memory` it contains. And then perhaps that could be used to have a function to transfer all modified memory, rather than manually doing it here.
@ -2088,3 +1487,1 @@
if (geom->is_mesh()) {
Mesh *mesh = static_cast<Mesh *>(geom);
mesh->subd_attributes.clear_modified();
size_t n_scenes = scene->dscenes.size();

n_scenes -> num_scenes is the convention.

n_scenes -> num_scenes is the convention.
leesonw marked this conversation as resolved
@ -2091,0 +1487,4 @@
size_t n_scenes = scene->dscenes.size();
// Parallel upload the geometry data to the devices and
// calculate or refit the BVHs
tbb::parallel_for(size_t(0),

Use just parallel_for, not tbb::parallel_for. We abstact such things in case we want to override them.

Use just `parallel_for`, not `tbb::parallel_for`. We abstact such things in case we want to override them.
leesonw marked this conversation as resolved
@ -2091,0 +1489,4 @@
// calculate or refit the BVHs
tbb::parallel_for(size_t(0),
n_scenes,
[this,

Can we automatically capture all these variables instead of doing it manually?

Can we automatically capture all these variables instead of doing it manually?
leesonw marked this conversation as resolved
@ -162,0 +185,4 @@
size_t *motion_vert_offsets;
};
struct AttributeSizes {

Can we avoid having these AttributeSizes and GeometrySizes structs? This feels like something we should be able to set on the device_memory in DeviceScene instead of tracking in a separate struct.

Or maybe it's not a good because every device has their own DeviceScene?

Can we avoid having these `AttributeSizes` and `GeometrySizes` structs? This feels like something we should be able to set on the `device_memory` in `DeviceScene` instead of tracking in a separate struct. Or maybe it's not a good because every device has their own `DeviceScene`?
brecht marked this conversation as resolved
@ -0,0 +1,1045 @@
/* SPDX-License-Identifier: Apache-2.0

This file needs a better name, or probably code needs to split across files in a different way. This doesn't seem like a logical grouping.

This file needs a better name, or probably code needs to split across files in a different way. This doesn't seem like a logical grouping.

Did you forget to remove this file?

Did you forget to remove this file?
Author
Member

Yes, I have removed it now :-/

Yes, I have removed it now :-/
@ -0,0 +34,4 @@
/*
* Clears all tags used to indicate the the shader needs to be updated.
*/
void GeometryManager::clearShaderUpdateTags(Scene *scene)

Don't use camelCase function names.

Don't use camelCase function names.
leesonw marked this conversation as resolved
@ -0,0 +121,4 @@
progress.set_status("Updating Mesh", "Computing attributes");
// SHOULD NOT ALLOC ONLY ALLOC IF MORE SPACE IS NEEDED

I guess this is a TODO comment to be resolved.

I guess this is a TODO comment to be resolved.
leesonw marked this conversation as resolved
@ -660,11 +660,8 @@ void ObjectManager::device_update_transforms(DeviceScene *dscene, Scene *scene,
}
});
if (progress.get_cancel()) {

Why remove this?

In general we want to be able to cancel rendering in the middle of updates so users don't have to wait too long.

Why remove this? In general we want to be able to cancel rendering in the middle of updates so users don't have to wait too long.
leesonw marked this conversation as resolved
@ -36,0 +37,4 @@
* checks the progress for if a cancel has been requested and also
* the device to see if an error has occurred.
*/
bool Scene::progressErrorCheck(Progress &progress, Device *device) {

I'd name this check_cancel_update or something like that, progressErrorCheck to me reads as if it's only checking for errors.

I'd name this `check_cancel_update` or something like that, `progressErrorCheck` to me reads as if it's only checking for errors.
leesonw marked this conversation as resolved
@ -36,0 +39,4 @@
*/
bool Scene::progressErrorCheck(Progress &progress, Device *device) {
bool status = false;
if (!background && progress.get_updates()) {

Cancelling quickly is important for F12 renders as well, which are "background" in Cycles.

Cancelling quickly is important for F12 renders as well, which are "background" in Cycles.
leesonw marked this conversation as resolved
@ -104,1 +116,4 @@
{
/* Create a DeviceScene for each device */
device->foreach_device([this](Device *sub_device) {
DeviceScene *sub_dscene = new DeviceScene(sub_device);

Use unique_ptr instead of new in new code.

Use `unique_ptr` instead of `new` in new code.
leesonw marked this conversation as resolved
@ -107,0 +127,4 @@
mesh_times = new double[device_count];
attrib_times = new double[device_count];
object_bvh_times = new double[device_count];
scene_bvh_times = new double[device_count];

Use vector instead of new[]. Also perhaps it's better to put these in a struct instead of separate arrays.

Use `vector` instead of `new[]`. Also perhaps it's better to put these in a struct instead of separate arrays.
leesonw marked this conversation as resolved
@ -17,3 +28,4 @@
CCL_NAMESPACE_BEGIN
class SpinLock {

Don't implement a custom spinlock here, or at all if we can avoid it since getting that right can be surprisingly difficult.

Don't implement a custom spinlock here, or at all if we can avoid it since getting that right can be surprisingly difficult.
leesonw marked this conversation as resolved
@ -41,2 +93,4 @@
error_message = "";
cancel_cb = function_null;
updates = true;
// update_call = true;

Don't leave uncommented code in.

Don't leave uncommented code in.
leesonw marked this conversation as resolved
William Leeson added 4 commits 2023-03-21 15:58:21 +01:00
William Leeson added 2 commits 2023-03-22 11:17:44 +01:00
The parameters for host_mem_alloc were changed to
`host_mem_alloc(size_t size, int aligment) from
host_mem_alloc(size_t size, int aligment, void **p_mem)
Ensures that only 1 BVH2 BVH is built for each object or scene.
This is done using lock to aquire the BVH2 and other threads
can skip the building if it is for an object. However, for the
top level BVH all threads wait on the TLAS to be finished.
William Leeson added 1 commit 2023-03-22 12:35:33 +01:00
William Leeson added 1 commit 2023-03-23 21:35:52 +01:00
Basically checks if the BVH2 BVH has been built or is being built.
Then when building the top level BVH2 BVH all devices that use it
wait for the one that is building it to finish before continuing.
William Leeson added 1 commit 2023-03-24 12:20:32 +01:00
William Leeson added 1 commit 2023-03-28 09:13:02 +02:00
William Leeson added 1 commit 2023-03-28 10:18:24 +02:00
William Leeson added 2 commits 2023-03-28 13:25:13 +02:00
The mult-device set the device pointer for the device_memory as it
iterated through the devices. However, it did not restore the
original pointer to the device if the device_ptr (pointer to the
memory buffer) was 0 which it always is after a mem_free.
Optix device was using the multi-bvh as the Optix BVH which was
causing the device to have the wrong handles for the objects.
William Leeson added 2 commits 2023-03-28 14:56:21 +02:00
Don't do progress updates when in headless mode. Previously
the background flag was used but this is also used for F12 render.
Which needs the updates.
Mutex needs to be locked before using the condition variable can
be used. This adds a scoped lock to ensure it is locked before
waiting on the condition_variable.
William Leeson added 1 commit 2023-03-29 10:11:04 +02:00
The top level BVH can only be built once all the other (bottom
level) BVHs are completed.
William Leeson added 1 commit 2023-03-29 18:20:50 +02:00
Brecht Van Lommel requested review from Brecht Van Lommel 2023-03-29 18:51:19 +02:00
William Leeson added 1 commit 2023-04-04 14:41:32 +02:00
William Leeson added 1 commit 2023-04-05 18:31:18 +02:00
William Leeson added 1 commit 2023-04-10 18:09:31 +02:00
William Leeson added 17 commits 2023-04-11 15:41:13 +02:00
This breaks up the geometry.cpp and geometry_additions.cpp into
the following files:
- geometry.cpp
- geometry_mesh.cpp
- geometry_attributes.cpp
- geometry_bvh.cpp
so as to organise the code into more manageable pieces.
Needed to add the memory copy methods the have offset and size.
Previously after every device was added the CPU device was swopped
to the end of the list. Now the position it was added at is saved
and it is swopped with the last device at the end.
Forgot to push the code which skips swopping if the is no CPU in
the MultiDevice set.
Pinned memory was used for all device_memory. However, due to
that it may cause memory pressure this was removed and can be
enabled by passing USE_DEVICE_PINNED_MEMORY.
Arrays of dougle values are replaced by a vector<> of a struct
scene_times.
William Leeson reviewed 2023-04-11 15:43:15 +02:00
Author
Member

Removed

Removed
leesonw marked this conversation as resolved
@ -14,1 +13,4 @@
const Device *device_)
: BVH(params_, geometry_, objects_), device(device_)
{
// Resize the sub-bvh container to match the number of devices
Author
Member

I added a variable and mutex to indicate if it was being built or has been built. This ensures that it is built only once. However, I put this code in build_bvh in device.cpp as the BVHMulti can have different BVH types within it and this only applies to BVH2.

I added a variable and mutex to indicate if it was being built or has been built. This ensures that it is built only once. However, I put this code in build_bvh in device.cpp as the BVHMulti can have different BVH types within it and this only applies to BVH2.
brecht marked this conversation as resolved
@ -548,3 +548,3 @@
const CUDAContextScope scope(this);
cuda_assert(cuMemcpyHtoD((CUdeviceptr)device_pointer, host_pointer, size));
cuda_assert(cuMemcpyHtoD((CUdeviceptr)device_pointer + offset,
Author
Member

Changing this to use reinterpret_cast<unsigned char *>() results in having to cast it back to CUdeviceptr as follows reinterpret_cast<CUdeviceptr>(reinterpret_cast<unsigned char *>(device_pointer)) otherwise the following error occurs:

device_impl.cpp:550:78: error: invalid conversion from ‘unsigned char*’ to ‘CUdeviceptr’ {aka ‘long long unsigned int’} [-fpermissive]
  550 | uMemcpyHtoD(reinterpret_cast<unsigned char *>(device_pointer) + offset,
      |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
      |                                                               |
      |                                                               unsigned char*

which is fine but it seems a bigger mouthful to swallow that just using CUdeviceptr in the first place.

Changing this to use `reinterpret_cast<unsigned char *>()` results in having to cast it back to `CUdeviceptr` as follows `reinterpret_cast<CUdeviceptr>(reinterpret_cast<unsigned char *>(device_pointer))` otherwise the following error occurs: ``` device_impl.cpp:550:78: error: invalid conversion from ‘unsigned char*’ to ‘CUdeviceptr’ {aka ‘long long unsigned int’} [-fpermissive] 550 | uMemcpyHtoD(reinterpret_cast<unsigned char *>(device_pointer) + offset, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~ | | | unsigned char* ``` which is fine but it seems a bigger mouthful to swallow that just using `CUdeviceptr` in the first place.

Ok, fine to leave it as is then.

Ok, fine to leave it as is then.
brecht marked this conversation as resolved
@ -551,0 +554,4 @@
void *CUDADevice::host_mem_alloc(size_t size, int aligment) {
void *p_mem = NULL;
CUDAContextScope scope(this);
Author
Member

That is a possibility, I can switch it off for now. In our usage we never had that happen but most of the systems were not memory constrained. Not putting it in pinned memory ends up with a double copy as the driver then copies it to pinned memory before copying to the device. I switched this back to using regular memory for now.

That is a possibility, I can switch it off for now. In our usage we never had that happen but most of the systems were not memory constrained. Not putting it in pinned memory ends up with a double copy as the driver then copies it to pinned memory before copying to the device. I switched this back to using regular memory for now.
brecht marked this conversation as resolved
@ -581,6 +607,29 @@ void CUDADevice::mem_copy_to(device_memory &mem)
}
}
void CUDADevice::mem_copy_to(device_memory &mem, size_t size, size_t offset)
Author
Member

This gives a great speed boost for interactive preview. I'll remove it from this PR and I can add it in a separate PR later.

This gives a great speed boost for interactive preview. I'll remove it from this PR and I can add it in a separate PR later.
brecht marked this conversation as resolved
@ -8,3 +8,3 @@
#include "bvh/params.h"
#include "scene/scene.h"
Author
Member

Ok, I replaced the include with a forward declaration. In another patch, instead of uploading the geometry data a second time we use the data in the DeviceScene which is already uploaded. This saves both memory and time. Would that not be acceptable?

Ok, I replaced the include with a forward declaration. In another patch, instead of uploading the geometry data a second time we use the data in the DeviceScene which is already uploaded. This saves both memory and time. Would that not be acceptable?
brecht marked this conversation as resolved
@ -377,4 +449,2 @@
{
size_t new_size = size(width, height, depth);
if (new_size != data_size) {
Author
Member

I reverted this change also.

I reverted this change also.
brecht marked this conversation as resolved
@ -369,1 +379,4 @@
device_vector(Device *device,
const char *name,
void *p_mem,
Author
Member

Yes, it looks like it's not used in this PR so I removed it.

Yes, it looks like it's not used in this PR so I removed it.
brecht marked this conversation as resolved
@ -833,0 +856,4 @@
}
else {
generic_copy_to(mem, size, offset);
}
Author
Member

I reverted this change also.

I reverted this change also.
brecht marked this conversation as resolved
@ -41,3 +45,3 @@
/* Always add CPU devices at the back since GPU devices can change
* host memory pointers, which CPU uses as device pointer. */
SubDevice *sub;
SubDevice *sub = new SubDevice;
Author
Member

Done. I also replaced the device creation with a unique_ptr<>

Done. I also replaced the device creation with a `unique_ptr<>`
brecht marked this conversation as resolved
@ -51,0 +54,4 @@
// to keep the CPU devices at the end of the vector.
int last = devices.size() - 1;
int o = last;
while ((o > 0) && (devices[o - 1]->device->info.type == DEVICE_CPU)) {
Author
Member

Yes, I fixed this up.

Yes, I fixed this up.
brecht marked this conversation as resolved
@ -162,0 +185,4 @@
size_t *motion_vert_offsets;
};
struct AttributeSizes {
Author
Member

I'll see what I can do. It is this way because in the original implementation the device_memory is not always exactly the same size (sometimes it is bigger) so we need a place to track the actual size of upload otherwise more memory is uploaded that needed.

I don't think putting it in DeviceScene is great because as you say if I put it in the DeviceScene then that information is replicated for each device.

I'll investigate more to see if there is somewhere else that this information would be logical.

I'll see what I can do. It is this way because in the original implementation the device_memory is not always exactly the same size (sometimes it is bigger) so we need a place to track the actual size of upload otherwise more memory is uploaded that needed. I don't think putting it in DeviceScene is great because as you say if I put it in the DeviceScene then that information is replicated for each device. I'll investigate more to see if there is somewhere else that this information would be logical.

I still think it's important to eliminate these, tracking this separately is really error prone.

I don't understand what you mean by "device_memory is not always exactly the same size (sometimes it is bigger)". Is this related to the device memory not shrinking with mem.device_size < mem.memory_size(), that I think we should change to != anyway?

Even if we were not shrinking it, that should be an implementation detail of device_vector similar to how it's an implementation detail of std::vector. It's not something that should be tracked externally.

I still think it's important to eliminate these, tracking this separately is really error prone. I don't understand what you mean by "device_memory is not always exactly the same size (sometimes it is bigger)". Is this related to the device memory not shrinking with `mem.device_size < mem.memory_size()`, that I think we should change to `!=` anyway? Even if we were not shrinking it, that should be an implementation detail of `device_vector` similar to how it's an implementation detail of `std::vector`. It's not something that should be tracked externally.
@ -36,0 +37,4 @@
* checks the progress for if a cancel has been requested and also
* the device to see if an error has occurred.
*/
bool Scene::progressErrorCheck(Progress &progress, Device *device) {
Author
Member

I changed this to check_cancel_update.

I changed this to check_cancel_update.
brecht marked this conversation as resolved
@ -36,0 +39,4 @@
*/
bool Scene::progressErrorCheck(Progress &progress, Device *device) {
bool status = false;
if (!background && progress.get_updates()) {
Author
Member

I changed this to only happen if updates are on. i.e. I removed the !background condition

I changed this to only happen if updates are on. i.e. I removed the !background condition
brecht marked this conversation as resolved
@ -107,0 +127,4 @@
mesh_times = new double[device_count];
attrib_times = new double[device_count];
object_bvh_times = new double[device_count];
scene_bvh_times = new double[device_count];
Author
Member

I replaced these with a vector and moved them into a struct.

I replaced these with a vector and moved them into a struct.
@ -17,3 +28,4 @@
CCL_NAMESPACE_BEGIN
class SpinLock {
Author
Member

I replaced this with the one in utils/thread.h and removed this code.

I replaced this with the one in utils/thread.h and removed this code.
brecht marked this conversation as resolved
@ -41,2 +93,4 @@
error_message = "";
cancel_cb = function_null;
updates = true;
// update_call = true;
Author
Member

Removed

Removed
brecht marked this conversation as resolved
William Leeson added 1 commit 2023-04-11 16:57:48 +02:00
1. remove camelCase method names.
2. fix up white space changes.
3. remove some unnecessary changes.
William Leeson added 1 commit 2023-04-11 18:50:44 +02:00
William Leeson added 2 commits 2023-04-12 12:04:04 +02:00
William Leeson added 1 commit 2023-04-13 15:37:25 +02:00
William Leeson reviewed 2023-04-13 15:57:32 +02:00
@ -0,0 +1,407 @@
/* SPDX-License-Identifier: Apache-2.0
Author
Member

I split this into geometry.cpp, geometry_mesh.cpp, geometry_attributes.cpp and geometry_bvh.cpp

I split this into geometry.cpp, geometry_mesh.cpp, geometry_attributes.cpp and geometry_bvh.cpp
brecht marked this conversation as resolved
@ -0,0 +34,4 @@
/*
* Clears all tags used to indicate the the shader needs to be updated.
*/
void GeometryManager::clearShaderUpdateTags(Scene *scene)
Author
Member

I removed the camelCase function names.

I removed the camelCase function names.
brecht marked this conversation as resolved
@ -0,0 +121,4 @@
sub_dscene->tri_verts.assign_mem(dscene->tri_verts);
sub_dscene->tri_verts.tag_modified();
}
{
Author
Member

Removed as it was an old comment

Removed as it was an old comment
brecht marked this conversation as resolved
William Leeson added 1 commit 2023-04-13 16:20:56 +02:00
William Leeson added 3 commits 2023-04-13 16:25:26 +02:00
William Leeson added 2 commits 2023-04-14 10:23:33 +02:00
William Leeson added 1 commit 2023-04-14 10:57:01 +02:00
Moved GeometrySizes and AttributeSizes to be stored in scene. This
resulted in them being removed as parameters where a Scene was
already passed in.
William Leeson added 1 commit 2023-04-14 13:42:04 +02:00
Refactored the GeometryManager methods so that those relating to
the DeviceScene are now methods in the DeviceScene.
William Leeson added 1 commit 2023-04-14 14:20:59 +02:00
As the device_update_bvh2 method just involves transferring data
to the device it was moved to DeviceScene.
William Leeson added 2 commits 2023-04-14 15:10:42 +02:00
Moving DeviceScene from scene.h caused the device definitions to be
no longer present in various files. This add them back in by
including device.h in the required files.
William Leeson changed title from WIP: Refactor geometry update for better parallelization and upload performance to Refactor geometry update for better parallelization and upload performance 2023-04-14 15:13:20 +02:00
Author
Member

@brecht I hope this is closer to what you are looking for. I have removed the WIP as I think it is ready now :-)

@brecht I hope this is closer to what you are looking for. I have removed the WIP as I think it is ready now :-)
Member

@blender-bot package

@blender-bot package
Member

Package build started. Download here when ready.

Package build started. [Download here](https://builder.blender.org/download/patch/PR105403) when ready.
William Leeson added 1 commit 2023-04-14 15:53:47 +02:00
Forgot to add the file for the previous change to the memory copy
methods, which resulted in the OneAPI build failing.

On the design side, I'm still confused about why parallelizing memory copying to multiple devices requires such deep changes on the host side, why this can't be abstracted more inside Device and MultiDevice.

CUDA and HIP already have a MemMap device_mem_map that keeps track of all device_memory. So if device_memory contains a bool modified_, that should be enough information to implement a method like Device::copy_modified_memory_to_device that just loops over all memory and copies where needed? And then that can be done inside a parallel for in MultiDevice.

Or am I missing a reason for why this is not practical?

On the design side, I'm still confused about why parallelizing memory copying to multiple devices requires such deep changes on the host side, why this can't be abstracted more inside `Device` and `MultiDevice`. CUDA and HIP already have a `MemMap device_mem_map` that keeps track of all `device_memory`. So if `device_memory` contains a `bool modified_`, that should be enough information to implement a method like `Device::copy_modified_memory_to_device` that just loops over all memory and copies where needed? And then that can be done inside a parallel for in `MultiDevice`. Or am I missing a reason for why this is not practical?
Brecht Van Lommel requested changes 2023-04-14 18:06:24 +02:00
Brecht Van Lommel left a comment
Owner

I'm having trouble reviewing the geometry update changes, since it's both moving code and changing the logic. It would be very helpful if you could submit a PR that just moves code to new files and order functions to be similar to the new code. Then I can quickly review and approve that, and then when this PR is updated hopefully it will show a readable delta between old and new code?

I'm having trouble reviewing the geometry update changes, since it's both moving code and changing the logic. It would be very helpful if you could submit a PR that just moves code to new files and order functions to be similar to the new code. Then I can quickly review and approve that, and then when this PR is updated hopefully it will show a readable delta between old and new code?
@ -125,6 +125,7 @@ void BlenderSession::create_session()
session = new Session(session_params, scene_params);
session->progress.set_update_callback(function_bind(&BlenderSession::tag_redraw, this));
session->progress.set_cancel_callback(function_bind(&BlenderSession::test_cancel, this));
session->progress.set_updates(!headless);

It's still not clear to me why showing updates should ever be disabled. Regardless if rendering headless or not it's good to see what is happening so Cycles doesn't look like it's stuck. If performance is an issue we can look at other ways to optimize this.

Can you explain the thinking behind this?

Regardless, I think this change and then changes in progress.h should be submitted a a separate PR. I think that can be reviewed and landed separate from this, there doesn't seem to be a dependency in either direction.

It's still not clear to me why showing updates should ever be disabled. Regardless if rendering headless or not it's good to see what is happening so Cycles doesn't look like it's stuck. If performance is an issue we can look at other ways to optimize this. Can you explain the thinking behind this? Regardless, I think this change and then changes in `progress.h` should be submitted a a separate PR. I think that can be reviewed and landed separate from this, there doesn't seem to be a dependency in either direction.
Author
Member

The reason behind this is that with many threads the mutexs in the progress calls were causing large stalls. So this was an easy way to switch then on an off. The stalls were frequent especially where the progress update was called inside a loop.

I have removed this for now.

The reason behind this is that with many threads the mutexs in the progress calls were causing large stalls. So this was an easy way to switch then on an off. The stalls were frequent especially where the progress update was called inside a loop. I have removed this for now.
@ -66,3 +66,3 @@
vector<Geometry *> geometry;
vector<Object *> objects;
bool built;

Add empty line after this, initialize to false directly here.

Add empty line after this, initialize to false directly here.
Author
Member

Done

Done
@ -24,0 +36,4 @@
{
int id = device->device_number(subdevice);
resize_sub_bvhs_if_needed(id);
sub_bvhs[id] = bvh;

It's not clear to me how the previous BVH that was stored here gets freed, or how the ownership here works in general. Ideally unique_ptr would be used to make that really clear.

It's not clear to me how the previous BVH that was stored here gets freed, or how the ownership here works in general. Ideally `unique_ptr` would be used to make that really clear.
Author
Member

I have changed this to use unique_ptr now

I have changed this to use `unique_ptr` now
leesonw marked this conversation as resolved
@ -54,3 +53,2 @@
BVH2 *const bvh2 = static_cast<BVH2 *>(bvh);
if (refit) {
bvh2->refit(progress);
static std::atomic<int> building{0};

Don't use static variables like this. There can be multiple Cycles sessions active at once that should not be blocked by each other here.

Don't use static variables like this. There can be multiple Cycles sessions active at once that should not be blocked by each other here.
Author
Member

Doh, I'd forgotten there could be multiple sessions :-/ I'll look into moving this into the top level BVH.

Doh, I'd forgotten there could be multiple sessions :-/ I'll look into moving this into the top level BVH.
Author
Member

I have updated this to storing the building count in the top level BVH so it should work with multiple sessions now.

I have updated this to storing the building count in the top level BVH so it should work with multiple sessions now.
@ -55,2 +54,2 @@
if (refit) {
bvh2->refit(progress);
static std::atomic<int> building{0};
/* Top level BVH2 build must wait on all other BVH2 builds to finish

I don't understand why all this complicated synchronization logic is here. Don't we already guarantee that the top level BVH is built after the bottom level BVHs from the caller functions?

I would think all that is needed here is a double checked lock, no condition variables or waiting.

I don't understand why all this complicated synchronization logic is here. Don't we already guarantee that the top level BVH is built after the bottom level BVHs from the caller functions? I would think all that is needed here is a double checked lock, no condition variables or waiting.
Author
Member

No, it only stops 2 different threads from building the same BVH. However, some threads can reach this point before all the object BVHs are built so it needs to wait on those BVHs to be built before it can proceed.

I don't this this would work in this case.

I'll look into moving the atomic into the top level BVH that way it won't block multiple sessions.

No, it only stops 2 different threads from building the same BVH. However, some threads can reach this point before all the object BVHs are built so it needs to wait on those BVHs to be built before it can proceed. I don't this this would work in this case. I'll look into moving the atomic<int> into the top level BVH that way it won't block multiple sessions.
Author
Member

This is needed because if more than 1 device uses BVH2 they all work on building it. This happens when you combine GPU + CPU rendering or more than one GPU that uses BVH2.

This is needed because if more than 1 device uses BVH2 they all work on building it. This happens when you combine GPU + CPU rendering or more than one GPU that uses BVH2.
@ -531,3 +536,2 @@
if (mem.type == MEM_GLOBAL) {
global_free(mem);
global_alloc(mem);
if ((mem.device_size < mem.memory_size()) || (!mem.device_pointer)) {

Same comment as before, change < to != to ensure memory decreases.

Same comment as before, change `<` to `!=` to ensure memory decreases.
Author
Member

Done

Done
@ -0,0 +90,4 @@
}
/*
* Clears the modified tags for all elements of the device scene

It's not all of them? For example object_motion_pass is not done here.

The function has a generic name but then only works on some subset related to geometry, that's confusing.

It's not all of them? For example `object_motion_pass` is not done here. The function has a generic name but then only works on some subset related to geometry, that's confusing.
Author
Member

I renamed the function device_free_geometry

I renamed the function `device_free_geometry`
@ -0,0 +128,4 @@
DeviceScene *dscene,
const GeometrySizes *p_sizes)
{
if (p_sizes->tri_size != 0) {

Are these conditionals checking for the size really needed? Why can't it just always do this? Don't all of the individual device vectors have sufficient info to skip operations if they are unnecessary?

Are these conditionals checking for the size really needed? Why can't it just always do this? Don't all of the individual device vectors have sufficient info to skip operations if they are unnecessary?
Author
Member

I believe this was just replicating similar code in geometry.cpp:1092 in method GeometryManager::device_update_mesh with an optimization of moving the copy inside the if. I'll remove these and rely on the device_memory instead.

I believe this was just replicating similar code in geometry.cpp:1092 in method GeometryManager::device_update_mesh with an optimization of moving the copy inside the if. I'll remove these and rely on the device_memory instead.
@ -630,3 +630,3 @@
}
double Session::get_estimated_remaining_time() const
double Session::get_estimated_remaining_time() //const

Add back const? The implementation did not change.

Add back const? The implementation did not change.
Author
Member

Done

Done
@ -7,0 +11,4 @@
#else
#define MUTEX thread_mutex
#define SCOPED_LOCK(mutex) thread_scoped_lock scopedlock(mutex)
#endif

I rather not use macros, especially not with such generic names. If a spin lock is faster we might as well just use that.

If there's really a need for both to be switchable, I prefer something like this inside the class:

using progress_mutex = thread_spin_lock;
using progress_lock = thread_scoped_spin_lock;
I rather not use macros, especially not with such generic names. If a spin lock is faster we might as well just use that. If there's really a need for both to be switchable, I prefer something like this inside the class: ``` using progress_mutex = thread_spin_lock; using progress_lock = thread_scoped_spin_lock; ```
Author
Member

I removed this.

I removed this.
William Leeson added 1 commit 2023-04-17 12:26:43 +02:00
William Leeson added 1 commit 2023-04-17 14:17:55 +02:00
William Leeson added 2 commits 2023-04-17 18:14:19 +02:00
William Leeson added 1 commit 2023-04-18 10:34:07 +02:00
William Leeson added 2 commits 2023-04-19 13:46:35 +02:00
William Leeson added 1 commit 2023-04-19 15:36:03 +02:00
William Leeson added 1 commit 2023-04-20 12:27:21 +02:00
William Leeson added 1 commit 2023-04-20 17:55:52 +02:00
William Leeson added 2 commits 2023-04-21 15:37:53 +02:00
Uses the unique_ptr to decallocate the BVH2 structure in the
BVH postprocess and assignes the bvh root indices in the device
specific areas also.
William Leeson added 1 commit 2023-04-21 15:56:35 +02:00
William Leeson added 1 commit 2023-04-21 17:29:54 +02:00
William Leeson added 1 commit 2023-04-24 10:38:35 +02:00
William Leeson added 4 commits 2023-04-26 12:23:25 +02:00
Some CUDA commands were used before the context was setup. Also
since the BVHs are build in order there is not need for the lock
to prevent 2 being build at tehe same time to save memory.
William Leeson added 1 commit 2023-04-26 15:00:37 +02:00
William Leeson added 3 commits 2023-04-27 11:16:31 +02:00
For the displacement and shadow texture generation the textures
were originally only uploaded to 1 device. However, this resulted
in some textures that were needed across all devices not being
uploaded. This fixes that by uploading them to all devices.
To reduce the amount of code and manage the life cycles of the
DeviceScenes they are now stored as unique_ptrs
William Leeson added 1 commit 2023-04-27 14:45:36 +02:00
William Leeson added 2 commits 2023-04-28 10:33:24 +02:00
Re-instates the task pool used to build the BVHs for the objects.
Also the displacement timing was being reported twice and this is
fixed.
William Leeson added 1 commit 2023-04-28 14:30:36 +02:00
William Leeson requested review from Brecht Van Lommel 2023-04-28 14:45:53 +02:00
William Leeson added 1 commit 2023-04-28 15:29:35 +02:00
Move CUDA context into a smaller scope
Some checks failed
buildbot/vexp-code-patch-coordinator Build done.
429f953d6c
Author
Member

@blender-bot package

@blender-bot package
Member

Package build started. Download here when ready.

Package build started. [Download here](https://builder.blender.org/download/patch/PR105403) when ready.
William Leeson added 1 commit 2023-05-01 10:05:18 +02:00
Merge branch 'upstream_main' into geometry_update
All checks were successful
buildbot/vexp-code-patch-coordinator Build done.
de58c2ab8e
Author
Member

I have updated the scene, geometry, object BVH and scene BVH update times graphs and updated the results in the spreadsheet.

I have updated the scene, geometry, object BVH and scene BVH update times graphs and updated the results in the [spreadsheet](https://docs.google.com/spreadsheets/d/1OYF3g988jZQXQG1pVopzIMQTRVo39V1znrLx8c92Rhs/edit?usp=sharing).
Author
Member

@blender-bot package

@blender-bot package
Member

Package build started. Download here when ready.

Package build started. [Download here](https://builder.blender.org/download/patch/PR105403) when ready.
William Leeson added 1 commit 2023-05-03 11:22:11 +02:00
William Leeson added 1 commit 2023-05-04 13:49:39 +02:00
William Leeson added 2 commits 2023-05-08 14:23:28 +02:00
If the device was the MultiDevice it previously returned the first
BVH. This was incorrect it should return the MultiBVH. Also, if
it cannot find the device then it should return NULL.
William Leeson added 2 commits 2023-05-08 16:49:26 +02:00
This code was duplicated in MultiDevice and geometry_bvh.cpp so
this removed that duplication.
William Leeson added 1 commit 2023-05-09 13:57:22 +02:00
This fixes an issue when using the BVH indirectly through the
MultiDevice. Setting the bvh_layout to none causes the ray queries
to be disabled. Also this does some refactoring to clean up the
use of the device_data_xfer_and_bvh_update method.
William Leeson added 1 commit 2023-05-10 15:33:48 +02:00
William Leeson added 1 commit 2023-05-12 10:16:40 +02:00
William Leeson added 1 commit 2023-05-15 16:12:58 +02:00
William Leeson added 1 commit 2023-05-16 10:11:25 +02:00
William Leeson added 1 commit 2023-05-17 10:54:47 +02:00
William Leeson added 1 commit 2023-05-17 15:38:15 +02:00
William Leeson added 1 commit 2023-05-17 18:05:19 +02:00
William Leeson added 2 commits 2023-05-19 08:51:12 +02:00
William Leeson added 1 commit 2023-05-22 10:49:55 +02:00
William Leeson added 1 commit 2023-05-25 13:35:29 +02:00