WIP:Interleaved slices for better work distriubtion with a Multi-GPU setup #110348

Draft
William Leeson wants to merge 82 commits from leesonw/blender-cluster:work_sets_similar into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.

82 Commits

Author SHA1 Message Date
William Leeson 13a2a80a6d Merge branch 'upstream_main' into work_sets_similar 2023-08-17 12:14:11 +02:00
William Leeson 7c9f2b82d9 Merge branch 'upstream_main' into work_sets_similar 2023-08-14 15:58:30 +02:00
William Leeson 10efe80d52 Merge branch 'upstream_main' into work_sets_similar 2023-08-09 16:27:36 +02:00
William Leeson 415b4c0487 Merge branch 'upstream_main' into work_sets_similar
buildbot/vexp-code-patch-coordinator Build done. Details
2023-08-07 11:40:21 +02:00
William Leeson 4243b1f1b6 Merge branch 'upstream_main' into work_sets_similar 2023-08-01 12:38:24 +02:00
William Leeson e069449b6d Merge branch 'upstream_main' into work_sets_similar 2023-07-26 12:37:50 +02:00
William Leeson 8c0bb158cc Remove whitespace change 2023-07-26 08:35:13 +02:00
William Leeson c612731844 Remove the SCOPED_MARKER used for profiling 2023-07-25 15:59:41 +02:00
William Leeson d7d5a4127d Unify slice size calculation algorithm
The 2 branch for interleaved or consequtive slices have been
replaced with a simpler check that determines the size of the
slice to share between the devices and the minimum number of
slices each device should have.
2023-07-25 14:37:48 +02:00
William Leeson 303b112eb3 Add back in the parallel_for copy 2023-07-25 14:26:39 +02:00
William Leeson b582969095 Remove unused reset method 2023-07-25 10:04:38 +02:00
William Leeson 483df40b4c Simple changes to make code match the original version 2023-07-25 10:01:50 +02:00
William Leeson ac246b3601 Remove pinned memory
This change is not required for this to work.
2023-07-25 09:52:26 +02:00
William Leeson 805eba65ca Switch between interleaved and consecutive slices 2023-07-24 17:00:56 +02:00
William Leeson 070febd56b Merge branch 'upstream_main' into work_sets_similar 2023-07-24 09:55:50 +02:00
William Leeson 87a299d0ef Clean up code and remove old debug checks 2023-07-21 16:34:03 +02:00
William Leeson a6e4771b27 FIX: Devices cannot be assigned more rows than scanlines
If the compute difference was huge it was possible for devices to
be assigned way to many rows sometimes more than there are devices.
This prevents this from happening by ensuring devices get at least
1 row.
2023-07-21 16:29:25 +02:00
William Leeson 2d05993da0 Clean up code and remove debug code 2023-07-21 15:21:07 +02:00
William Leeson c703a7c8ac Change device_scale_factor to interleaved_slices
Replace the device_scale_factor with a check box to enable or
disable the interleaved slices.
2023-07-21 13:53:02 +02:00
William Leeson b86a443aaa FIX: padding takes interleaved scanlines into account
Previously padding did not use the interleaved scanlines to pad
the data and instead just wrote to the first n scanlines. Now it
iterates over the correct scanlines updating the correct set
based on the data in the BufferParams.
2023-07-21 13:49:02 +02:00
William Leeson 2d5198beb4 Merge branch 'upstream_main' into work_sets_similar 2023-07-21 10:05:16 +02:00
William Leeson 155fd0991f Remove temp fix while work on real solution 2023-07-21 08:14:33 +02:00
William Leeson 514f8a7990 FIX: For Baking don't use interleaved slices
For some reason at the moment baking roughness does not work due
to the interleaved scanlines between the devices. So for now it
reverts to just using 2 big slices.
2023-07-20 17:11:56 +02:00
William Leeson 5e532bb022 FIX: Baking now reads the correct scanlines into the RenderBuffers
The scanlines were just copied serially not taking into account
the slices this is now corrected.
2023-07-20 17:10:22 +02:00
William Leeson 3e93532b24 Remove device_scale_factor from copy to/from routines. 2023-07-20 08:11:19 +02:00
William Leeson 989dd1d3ef FIX: Correctly account for partial slices
The last slices that are not full sized need to take the current_y
int account to determine how many scanlines are left.
2023-07-20 08:09:43 +02:00
William Leeson b8215503d5 Use the smallest device weight to choose the slice sizes. 2023-07-19 15:51:17 +02:00
William Leeson fdef5c310c Merge branch 'upstream_main' into work_sets 2023-07-19 09:50:17 +02:00
William Leeson 797e2a7b11 Merge branch 'upstream_main' into work_sets 2023-07-06 12:10:13 +02:00
William Leeson b9a17d2d31 Merge branch 'work_sets' of projects.blender.org:leesonw/blender-cluster into work_sets 2023-07-06 12:09:49 +02:00
William Leeson 3203ca5c19 Merge branch 'upstream_main' into work_sets 2023-07-04 14:10:28 +00:00
William Leeson dcb9476e9c FIX: Stop (get/set)_render_tile_pixels using work_sets
Sets the master_buffers as the effective_buffer_params so they
only iterate once instead of per buffer. As the device_pointers
cannot be used as regular pointers.
2023-06-30 14:18:39 +02:00
William Leeson 4e424d384f Make PassAssessor and master_buffers_ aware of slice structure
Adds the correct BufferParams to the master_buffers and also
changes the code to the PathAccessors for bot CPU and GPU to copy
the images according to the slice structure.
2023-06-29 18:32:21 +02:00
William Leeson 943f9c3e59 Merge branch 'upstream_main' into work_sets 2023-06-29 09:29:37 +02:00
William Leeson 481d1c4423 FIX: MacOS compiler cannot initialise dynamic arrays
Replace initializer with for loop.
2023-06-27 14:24:07 +02:00
William Leeson 295652a1b9 Merge branch 'upstream_main' into work_sets 2023-06-26 11:49:13 +02:00
William Leeson 9ba33c795d Remove workset from denoise update 2023-06-13 15:31:18 +02:00
William Leeson d9442b3969 Adaptive sampling uses only one parallel_for on CPU
Also fixes an issue where the render_samples_impl was getting the
incorrect height.
2023-06-13 14:11:53 +02:00
William Leeson 7f11e5f38d Cryptomatte uses only 1 kernel launch on GPU 2023-06-13 14:11:02 +02:00
William Leeson 7809f5e275 Use a single kernel launch call to perform the adaptive sampling with slices 2023-06-13 13:26:24 +02:00
William Leeson 7c4f3aa0d4 Remove the need for using the WorkSet size()
Adds device_scale_factor_ member variable to PathTraceWork for
iterating over the slices.
2023-06-13 12:46:14 +02:00
William Leeson 4b4fbf6ec0 Clean up code 2023-06-13 11:09:34 +02:00
William Leeson 240cad9fb0 Make Copy from render buffers use slice buffer params 2023-06-13 11:00:15 +02:00
William Leeson 7d0cb956d4 Use slices buffer params to copy data to render buffers 2023-06-13 10:42:06 +02:00
William Leeson d463f4d194 Pre-calculate master buffer size 2023-06-13 10:03:05 +02:00
William Leeson 67116b0844 FIX: Fixes debug build
The debug build was failing because dna_type_offsets.h was not
always generated when building. This also sometimes more rarely
affected the release build.
2023-06-12 14:14:56 +02:00
William Leeson ebfddd1c1a Merge branch 'upstream_main' into work_sets 2023-06-12 11:06:42 +02:00
William Leeson 7460500a81 FIX: RenderBuffers state now setup correctly for NODE
The parameters for slices added to the RenderBuffers was not setup
correctly for use with Nodes. This adds the necessary setup code.

Also, switched the render buffer to not use pinned memory.
2023-06-12 09:33:51 +02:00
William Leeson cf298a1d7e FIX: Update slice buffer offest into the master buffer on change
The slice buffers offsets into the master buffer were only updated
when the master buffer was reallocated. This ignored that fact that
the resolution scaler could resize the buffer and the slices even
though the master buffer was not reallocated.
2023-06-09 10:39:54 +02:00
William Leeson 7d1379f95d Path rng_hash uses image pixel coordinates
Previously it was using the slice coordinates
2023-06-07 10:36:59 +02:00
William Leeson 2bbf552f09 Render all slices in one go
Previously the render_samples iterated over all the WorkSets.
However, this was not ideal due to overheads and was not good at
keeping the GPU busy. Now info is passed in the WorkTile to enable
the GPU to render all the slices in one pass.
2023-06-07 10:11:54 +02:00
William Leeson c836ee5e99 Merge branch 'upstream_main' into work_sets 2023-06-05 11:00:57 +02:00
William Leeson 098353947d Add some markers to visualise the render_pipeline 2023-06-02 10:12:01 +02:00
William Leeson 35335e4932 Adds NVTX markers for viewing program flow and execution 2023-06-02 10:11:07 +02:00
William Leeson 69e04ddeb3 Merge branch 'upstream_main' into work_sets 2023-06-01 12:48:34 +02:00
William Leeson 0e0056707e FIX: Master buffer is now copied directly using it's buffer
CPU data was not being copied properly due to the buffer not
having the correct parameters.
2023-05-31 13:31:10 +02:00
William Leeson f283a9beb4 Merge branch 'upstream_main' into work_sets 2023-05-31 10:43:43 +02:00
William Leeson 2ad2d46aeb Removes performance penalty due to rebalance for having slices. 2023-05-30 13:41:16 +02:00
William Leeson ef5d66e1bf FIX: Buffers must always be cleared 2023-05-29 13:54:13 +02:00
William Leeson 6a73c7d345 Merge branch 'upstream_main' into work_sets 2023-05-29 11:17:10 +02:00
William Leeson 0a716b2f68 Put wavefront tracing counters in pinned memory. 2023-05-29 09:01:01 +02:00
William Leeson 76b0775361 Clean up code 2023-05-25 10:24:34 +02:00
William Leeson bd669026e3 Skip over 0 height slices 2023-05-25 09:50:52 +02:00
William Leeson f0185ed234 FIX: Don't copy 0 height slices to display
Attempting to copy a zero height slice resulted in CUDA errors.
Also switches back to using the master buffer when zeroing all
the slices.
2023-05-25 00:16:59 +02:00
William Leeson 7a3135ec98 FIX: Set pinned to false when not allocating pinned memory 2023-05-25 00:15:45 +02:00
William Leeson 47e1cd75cb Merge branch 'upstream_main' into work_sets 2023-05-24 21:00:54 +02:00
William Leeson 6050b49ee4 FIX: Use remaining rows in the last work item
The remain rows were not added to the last work item as it was
detected incorrectly.
2023-05-24 20:59:35 +02:00
William Leeson 9565aaa6fe FIX: Denoise buffer when more than one work_set
Previouly when there were more than one worker (aka device) and
multiple work_sets the denoising was skipped.
2023-05-24 14:43:56 +02:00
William Leeson cbdb1379e2 Use a single master buffer to hold all the slices
This replaces n slice buffers with a single master buffer and n
slices which reference into the master buffer. This allows a
single copy to upload or download the data.
2023-05-24 09:45:12 +02:00
William Leeson 8c4db51deb FIX: Don't read kernel data for the KernelLightTreeEmitter if the index is -1 2023-05-24 09:42:54 +02:00
William Leeson d8bd1c1eb8 Merge branch 'upstream_main' into work_sets 2023-05-22 14:23:40 +02:00
William Leeson 4c1d196af8 Merge branch 'upstream_main' into work_sets 2023-05-22 11:30:10 +02:00
William Leeson 9c7f5266a6 Merge branch 'upstream_main' into work_sets 2023-05-19 09:16:53 +02:00
William Leeson 790e12a444 FIX: Adds missing device import for MacOS bvh.mm 2023-05-17 11:00:23 +02:00
William Leeson 5f5595caa7 Allocate render buffers as pinned so as to be able to bg transfer 2023-05-16 14:34:31 +02:00
William Leeson b8fbf9dfcb Allocate path trace counters in pinned memory to allow bg transfer 2023-05-16 14:33:50 +02:00
William Leeson edfb257495 Add the ability to allocate pinned memory 2023-05-16 14:33:22 +02:00
William Leeson d21cd0bc2e Merge branch 'upstream_main' into work_sets 2023-05-16 10:35:30 +02:00
William Leeson 84fa6cfc51 Merge branch 'upstream_main' into work_sets 2023-05-12 10:46:07 +02:00
William Leeson c5da8e6429 Request transfers for all slices then sync only once 2023-05-12 09:08:33 +02:00
William Leeson 4e4d5f3ea0 Only realloc the buffer if it is bigger 2023-05-12 09:07:00 +02:00
William Leeson 127066dd99 Add work_sets for better multi-gpu scaling 2023-05-10 13:58:26 +02:00