Improved Surface Deform performance #116545

Open
Eugene-Kuznetsov wants to merge 3 commits from Eugene-Kuznetsov/blender:ek_surface_deform into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.
Contributor

Open the attached blend file. Go into geonodes trees on both hair objects and click "Bake" in their respective bake nodes. Hit "play".

With unpatched Blender, I see around 1 fps (Threadripper 5955WX).

This patch takes me from 1 fps to 7 fps. (With 'viewport thinning' set to 0.1, the effect is even stronger, from the same 1 fps to 9-10 fps.)

Open the attached blend file. Go into geonodes trees on both hair objects and click "Bake" in their respective bake nodes. Hit "play". With unpatched Blender, I see around 1 fps (Threadripper 5955WX). This patch takes me from 1 fps to 7 fps. (With 'viewport thinning' set to 0.1, the effect is even stronger, from the same 1 fps to 9-10 fps.)
Eugene-Kuznetsov force-pushed ek_surface_deform from 25163d7d78 to 429689b1bf 2023-12-26 05:17:07 +01:00 Compare
Iliya Katushenock added this to the Nodes & Physics project 2023-12-26 14:01:31 +01:00
Iliya Katushenock added the
Interest
Geometry Nodes
label 2023-12-26 14:01:39 +01:00
Iliya Katushenock reviewed 2023-12-26 14:24:15 +01:00
Iliya Katushenock left a comment
Member

Thanks for this path to implement/optimize UV sampler.
Strongly sure this node can be improved much more if reimplement this as parallel logarithmic search tree, instead of current hash map (see: #106713). Or, at least, to built this has map more parallel/exact for strips representation.
Not sure about point of this PR about using 2d array, this looks even worst for some cases with sparse uv data (for memory usage ?).
Would be better to add more different tests with measured numbers and attached .blend files to simplify checking on different machines. Technical details also would be helpful in description.
Just added some code-style comments.

Thanks for this path to implement/optimize UV sampler. Strongly sure this node can be improved much more if reimplement this as parallel logarithmic search tree, instead of current hash map (see: #106713). Or, at least, to built this has map more parallel/exact for strips representation. Not sure about point of this PR about using 2d array, this looks even worst for some cases with sparse uv data (for memory usage ?). Would be better to add more different tests with measured numbers and attached .blend files to simplify checking on different machines. Technical details also would be helpful in description. Just added some code-style comments.
@ -720,3 +720,3 @@
GMutableSpan dst);
void calculate_basis(const float parameter, float4 &r_weights);
inline void calculate_basis(const float parameter, float4 &r_weights)

Weights can be returned value, if this is refactoring.

Weights can be returned value, if this is refactoring.
@ -1326,6 +1326,21 @@ MINLINE bool is_zero_v4_db(const double v[4])
return (v[0] == 0.0 && v[1] == 0.0 && v[2] == 0.0 && v[3] == 0.0);
}
MINLINE bool is_finite_v2(const float v[2])

Inlining is not the best thing to do. You should be able to split this in to separate PR, and measure performance impact (strong) to convince of the need for this.

Inlining is not the best thing to do. You should be able to split this in to separate PR, and measure performance impact (strong) to convince of the need for this.
@ -26,0 +24,4 @@
int2 resolution_;
int2 span_;
std::vector<std::vector<int> > corner_tris_by_cell_;

You should not use std::vector instead of blender::Vector without strong reason.
You should not use Vector<Vector<A>> instead of Array<int> offsets; Array<A> data; GroupedSpan<A> values(offsets, data); without strong reason (see as examples: #110707. #115142, #111081, #114683).

You should not use `std::vector` instead of `blender::Vector` without strong reason. You should not use `Vector<Vector<A>>` instead of `Array<int> offsets; Array<A> data; GroupedSpan<A> values(offsets, data);` without strong reason (see as examples: #110707. #115142, #111081, #114683).
Author
Contributor

I've switched to blender::Vector. Array/GroupedSpan can't be used here.

I've switched to blender::Vector. Array/GroupedSpan can't be used here.
@ -15,2 +14,3 @@
static int2 uv_to_cell_key(const float2 &uv, const float2 &uv_base, const int2 &resolution, int2 span)
{
return int2{uv * resolution};
auto key = int2{int((uv.x-uv_base.x)*resolution.x), int((uv.y-uv_base.y)*resolution.y)};

There is missed default formatting, run make format in root folder of your blender fork.

There is missed default formatting, run `make format` in root folder of your blender fork.
@ -15,2 +15,3 @@
{
return int2{uv * resolution};
auto key = int2{int((uv.x-uv_base.x)*resolution.x), int((uv.y-uv_base.y)*resolution.y)};
return int2{std::max(0,std::min(span.x-1,key.x)), std::max(0,std::min(span.y-1,key.y))};

Magic numbers.

Magic numbers.
@ -23,0 +30,4 @@
min_uv = max_uv = uv_0;
int live_count = 0;
std::vector<uint8_t> mask(corner_tris.size());
`uint8_t` -> `int8_t` (maybe even `bool`?) https://wiki.blender.org/wiki/Style_Guide/C_Cpp#Integer_Types
Eugene-Kuznetsov marked this conversation as resolved
@ -30,3 +57,1 @@
const int2 key_0 = uv_to_cell_key(uv_0, resolution_);
const int2 key_1 = uv_to_cell_key(uv_1, resolution_);
const int2 key_2 = uv_to_cell_key(uv_2, resolution_);
resolution_.x = std::max<int>(1, mult / (max_uv[0]-min_uv[0] + 1e-6));

math::max<int>

`math::max<int>`
Eugene-Kuznetsov marked this conversation as resolved
Author
Contributor

Thanks for this path to implement/optimize UV sampler.
Strongly sure this node can be improved much more if reimplement this as parallel logarithmic search tree, instead of current hash map (see: #106713). Or, at least, to built this has map more parallel/exact for strips representation.
Not sure about point of this PR about using 2d array, this looks even worst for some cases with sparse uv data (for memory usage ?).

The 2d array gives a good balance between construction time and sampling time. Notice that sampling is threaded but construction is not, and the sampler object is constructed every frame (actually, twice per frame for every Surface Deform node). Much of the performance gain comes from replacing the old complicated MultiMap structure with a simple array. The code is now at the point where ReverseUVSampler is not a significant bottleneck, and further optimizations are not going to yield as much. Reusing samplers would be a more promising direction.

I've benchmarked just the changes for barycentric_coords_v2 and is_finite_v2/v3/v4, I estimate that inlining results in a 10% to 15% reduction in execution time of ReverseUVSampler::sample_many.

Here's a blend file with two more test meshes. I see at least 5x speedup with 40k strands per mesh, it gets progressively smaller with more strands, it's as low as 10-20% with 2M strands/mesh. (At that point, the bottleneck is https://projects.blender.org/blender/blender/src/branch/main/source/blender/draw/intern/draw_cache_impl_curves.cc#L432-L436.)

> Thanks for this path to implement/optimize UV sampler. > Strongly sure this node can be improved much more if reimplement this as parallel logarithmic search tree, instead of current hash map (see: #106713). Or, at least, to built this has map more parallel/exact for strips representation. > Not sure about point of this PR about using 2d array, this looks even worst for some cases with sparse uv data (for memory usage ?). The 2d array gives a good balance between construction time and sampling time. Notice that sampling is threaded but construction is not, and the sampler object is constructed every frame (actually, twice per frame for every Surface Deform node). Much of the performance gain comes from replacing the old complicated MultiMap structure with a simple array. The code is now at the point where ReverseUVSampler is not a significant bottleneck, and further optimizations are not going to yield as much. Reusing samplers would be a more promising direction. I've benchmarked just the changes for barycentric_coords_v2 and is_finite_v2/v3/v4, I estimate that inlining results in a 10% to 15% reduction in execution time of ReverseUVSampler::sample_many. Here's a blend file with two more test meshes. I see at least 5x speedup with 40k strands per mesh, it gets progressively smaller with more strands, it's as low as 10-20% with 2M strands/mesh. (At that point, the bottleneck is https://projects.blender.org/blender/blender/src/branch/main/source/blender/draw/intern/draw_cache_impl_curves.cc#L432-L436.)
Eugene-Kuznetsov force-pushed ek_surface_deform from 429689b1bf to c51a967a0a 2023-12-27 09:51:25 +01:00 Compare

It is hard to do benchmarks without some file with just one node. Usually, you should use BLI_timeit.hh header, instead of FPS, to do not measure impact of other system of blender.
And file should be as simple as possible, to do not measure impact of all other nodes in geometry nodes setup.
I have simplify your file (few) by hiding all objects except horses. Here is my measurements:

  • Main:
    Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 182.04 ms, Min: 182.04 ms, Last: 182.04 ms)
    Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 157.20 ms, Min: 157.20 ms, Last: 157.20 ms)
    Timer 'ReverseUVSampler::sample_many': (Average: 400 ns, Min: 400 ns, Last: 400 ns)
    Timer 'ReverseUVSampler::sample_many': (Average: 300 ns, Min: 200 ns, Last: 200 ns)
    Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 168.13 ms, Min: 154.23 ms, Last: 154.23 ms)
    Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 173.34 ms, Min: 157.20 ms, Last: 189.49 ms)
    Timer 'ReverseUVSampler::sample_many': (Average: 39.51 ms, Min: 200 ns, Last: 118.54 ms)
    Timer 'ReverseUVSampler::sample_many': (Average: 49.56 ms, Min: 200 ns, Last: 79.69 ms)
    Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 243.12 ms, Min: 154.23 ms, Last: 393.11 ms)
    Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 217.38 ms, Min: 157.20 ms, Last: 305.46 ms)
    Timer 'ReverseUVSampler::sample_many': (Average: 39.92 ms, Min: 200 ns, Last: 1.36 ms)
    Timer 'ReverseUVSampler::sample_many': (Average: 33.52 ms, Min: 200 ns, Last: 1.55 ms)
  • PR:
    Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 5.07 ms, Min: 5.07 ms, Last: 5.07 ms)
    Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 5.73 ms, Min: 5.73 ms, Last: 5.73 ms)
    Timer 'ReverseUVSampler::sample_many': (Average: 400 ns, Min: 400 ns, Last: 400 ns)
    Timer 'ReverseUVSampler::sample_many': (Average: 300 ns, Min: 200 ns, Last: 200 ns)
    Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 2.73 ms, Min: 0.38 ms, Last: 0.38 ms)
    Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 3.11 ms, Min: 0.48 ms, Last: 0.48 ms)
    Timer 'ReverseUVSampler::sample_many': (Average: 217.30 ms, Min: 200 ns, Last: 651.90 ms)
    Timer 'ReverseUVSampler::sample_many': (Average: 281.37 ms, Min: 200 ns, Last: 473.56 ms)
    Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 1.88 ms, Min: 0.20 ms, Last: 0.20 ms)
    Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 2.12 ms, Min: 0.16 ms, Last: 0.16 ms)
    Timer 'ReverseUVSampler::sample_many': (Average: 231.22 ms, Min: 200 ns, Last: 30.63 ms)
    Timer 'ReverseUVSampler::sample_many': (Average: 198.93 ms, Min: 200 ns, Last: 37.49 ms)

As i can see, this PR change speed of both constructor and sample methods.

It is hard to do benchmarks without some file with just one node. Usually, you should use `BLI_timeit.hh` header, instead of FPS, to do not measure impact of other system of blender. And file should be as simple as possible, to do not measure impact of all other nodes in geometry nodes setup. I have simplify your file (few) by hiding all objects except horses. Here is my measurements: - Main: Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 182.04 ms, Min: 182.04 ms, Last: 182.04 ms) Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 157.20 ms, Min: 157.20 ms, Last: 157.20 ms) Timer 'ReverseUVSampler::sample_many': (Average: 400 ns, Min: 400 ns, Last: 400 ns) Timer 'ReverseUVSampler::sample_many': (Average: 300 ns, Min: 200 ns, Last: 200 ns) Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 168.13 ms, Min: 154.23 ms, Last: 154.23 ms) Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 173.34 ms, Min: 157.20 ms, Last: 189.49 ms) Timer 'ReverseUVSampler::sample_many': (Average: 39.51 ms, Min: 200 ns, Last: 118.54 ms) Timer 'ReverseUVSampler::sample_many': (Average: 49.56 ms, Min: 200 ns, Last: 79.69 ms) Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 243.12 ms, Min: 154.23 ms, Last: 393.11 ms) Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 217.38 ms, Min: 157.20 ms, Last: 305.46 ms) Timer 'ReverseUVSampler::sample_many': (Average: 39.92 ms, Min: 200 ns, Last: 1.36 ms) Timer 'ReverseUVSampler::sample_many': (Average: 33.52 ms, Min: 200 ns, Last: 1.55 ms) - PR: Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 5.07 ms, Min: 5.07 ms, Last: 5.07 ms) Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 5.73 ms, Min: 5.73 ms, Last: 5.73 ms) Timer 'ReverseUVSampler::sample_many': (Average: 400 ns, Min: 400 ns, Last: 400 ns) Timer 'ReverseUVSampler::sample_many': (Average: 300 ns, Min: 200 ns, Last: 200 ns) Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 2.73 ms, Min: 0.38 ms, Last: 0.38 ms) Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 3.11 ms, Min: 0.48 ms, Last: 0.48 ms) Timer 'ReverseUVSampler::sample_many': (Average: 217.30 ms, Min: 200 ns, Last: 651.90 ms) Timer 'ReverseUVSampler::sample_many': (Average: 281.37 ms, Min: 200 ns, Last: 473.56 ms) Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 1.88 ms, Min: 0.20 ms, Last: 0.20 ms) Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 2.12 ms, Min: 0.16 ms, Last: 0.16 ms) Timer 'ReverseUVSampler::sample_many': (Average: 231.22 ms, Min: 200 ns, Last: 30.63 ms) Timer 'ReverseUVSampler::sample_many': (Average: 198.93 ms, Min: 200 ns, Last: 37.49 ms) As i can see, this PR change speed of both constructor and sample methods.
Author
Contributor

As i can see, this PR change speed of both constructor and sample methods.

sample_many seems to be much slower on your end (what hardware is this?) Here's what I'm getting with your blend file:

main:

Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 184.07 ms, Min: 0.44 ms, Last: 45.51 ms)
Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 182.58 ms, Min: 0.44 ms, Last: 46.06 ms)
Timer 'ReverseUVSampler::sample_many': (Average: 1.06 ms, Min: 20 ns, Last: 60 ns)
Timer 'ReverseUVSampler::sample_many': (Average: 1.05 ms, Min: 20 ns, Last: 30 ns)

PR:

Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 0.80 ms, Min: 94541 ns, Last: 0.12 ms)
Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 0.79 ms, Min: 94541 ns, Last: 0.30 ms)
Timer 'ReverseUVSampler::sample_many': (Average: 5.34 ms, Min: 20 ns, Last: 6.23 ms)
Timer 'ReverseUVSampler::sample_many': (Average: 5.37 ms, Min: 20 ns, Last: 9.69 ms)

There is a magic number 0.25 here c51a967a0a/source/blender/geometry/intern/reverse_uv_sampler.cc (L60) that controls the balance between constructor and sampler. It probably needs to be tuned. Try to set it to 1.0.

I'll try to benchmark on a couple more systems.

> > As i can see, this PR change speed of both constructor and sample methods. sample_many seems to be *much* slower on your end (what hardware is this?) Here's what I'm getting with your blend file: main: Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 184.07 ms, Min: 0.44 ms, Last: 45.51 ms) Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 182.58 ms, Min: 0.44 ms, Last: 46.06 ms) Timer 'ReverseUVSampler::sample_many': (Average: 1.06 ms, Min: 20 ns, Last: 60 ns) Timer 'ReverseUVSampler::sample_many': (Average: 1.05 ms, Min: 20 ns, Last: 30 ns) PR: Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 0.80 ms, Min: 94541 ns, Last: 0.12 ms) Timer 'ReverseUVSampler::ReverseUVSampler': (Average: 0.79 ms, Min: 94541 ns, Last: 0.30 ms) Timer 'ReverseUVSampler::sample_many': (Average: 5.34 ms, Min: 20 ns, Last: 6.23 ms) Timer 'ReverseUVSampler::sample_many': (Average: 5.37 ms, Min: 20 ns, Last: 9.69 ms) There is a magic number 0.25 here https://projects.blender.org/blender/blender/src/commit/c51a967a0a7a9ee55694cedd30c36a474e35d53a/source/blender/geometry/intern/reverse_uv_sampler.cc#L60 that controls the balance between constructor and sampler. It probably needs to be tuned. Try to set it to 1.0. I'll try to benchmark on a couple more systems.
Author
Contributor

Also, in that blend file, most objects are disabled in viewport, but ReverseUVSampler still runs for all of them. That's why "Min", "Last" and "Average" times are all different - you're getting the "Average" for all objects, and the "Last" for one of several objects. And it's not even doing sample_all on the horse, because the modifier geonode that generates horse hair is disabled.

To get clean numbers, I had to do this:
38e34e82d6
bde05142c8 (PR branch)
6e1e74ab08 (main branch)

"Exclude from View Layer" all collections except Collection 3
reenable all modifiers on "Horse hair", click "Bake" on the bake node.

main:
Timer 'ReverseUVSampler::ReverseUVSampler(31870)': (Average: 16.88 ms, Min: 15.23 ms, Last: 15.65 ms)
Timer 'ReverseUVSampler::ReverseUVSampler(74282)': (Average: 36.78 ms, Min: 35.06 ms, Last: 37.24 ms)
Timer 'ReverseUVSampler::sample_many(74282, 200145)': (Average: 2.69 ms, Min: 1.77 ms, Last: 3.13 ms)

PR (magic 0.25):
Timer 'ReverseUVSampler::ReverseUVSampler(31870)': (Average: 0.99 ms, Min: 0.75 ms, Last: 0.97 ms)
Timer 'ReverseUVSampler::ReverseUVSampler(74282)': (Average: 1.45 ms, Min: 1.34 ms, Last: 1.51 ms)
Timer 'ReverseUVSampler::sample_many(73685, 200145)': (Average: 7.17 ms, Min: 5.03 ms, Last: 7.39 ms)

PR (magic 1.0):
Timer 'ReverseUVSampler::ReverseUVSampler(31870)': (Average: 2.11 ms, Min: 1.56 ms, Last: 2.02 ms)
Timer 'ReverseUVSampler::ReverseUVSampler(74282)': (Average: 3.59 ms, Min: 2.73 ms, Last: 3.35 ms)
Timer 'ReverseUVSampler::sample_many(73685, 200145)': (Average: 2.61 ms, Min: 1.83 ms, Last: 2.89 ms)
Also, in that blend file, most objects are disabled in viewport, but ReverseUVSampler still runs for all of them. That's why "Min", "Last" and "Average" times are all different - you're getting the "Average" for *all* objects, and the "Last" for one of several objects. And it's not even doing sample_all on the horse, because the modifier geonode that generates horse hair is disabled. To get clean numbers, I had to do this: https://projects.blender.org/Eugene-Kuznetsov/blender/commit/38e34e82d6170b7ee4b88b4e68bcfcaa435eb0f4 https://projects.blender.org/Eugene-Kuznetsov/blender/commit/bde05142c87803ed9c9c57a838800e60eb974976 (PR branch) https://projects.blender.org/Eugene-Kuznetsov/blender/commit/6e1e74ab086b7c05609594c00dfe786b72bef9d3 (main branch) "Exclude from View Layer" all collections except Collection 3 reenable all modifiers on "Horse hair", click "Bake" on the bake node. ``` main: Timer 'ReverseUVSampler::ReverseUVSampler(31870)': (Average: 16.88 ms, Min: 15.23 ms, Last: 15.65 ms) Timer 'ReverseUVSampler::ReverseUVSampler(74282)': (Average: 36.78 ms, Min: 35.06 ms, Last: 37.24 ms) Timer 'ReverseUVSampler::sample_many(74282, 200145)': (Average: 2.69 ms, Min: 1.77 ms, Last: 3.13 ms) PR (magic 0.25): Timer 'ReverseUVSampler::ReverseUVSampler(31870)': (Average: 0.99 ms, Min: 0.75 ms, Last: 0.97 ms) Timer 'ReverseUVSampler::ReverseUVSampler(74282)': (Average: 1.45 ms, Min: 1.34 ms, Last: 1.51 ms) Timer 'ReverseUVSampler::sample_many(73685, 200145)': (Average: 7.17 ms, Min: 5.03 ms, Last: 7.39 ms) PR (magic 1.0): Timer 'ReverseUVSampler::ReverseUVSampler(31870)': (Average: 2.11 ms, Min: 1.56 ms, Last: 2.02 ms) Timer 'ReverseUVSampler::ReverseUVSampler(74282)': (Average: 3.59 ms, Min: 2.73 ms, Last: 3.35 ms) Timer 'ReverseUVSampler::sample_many(73685, 200145)': (Average: 2.61 ms, Min: 1.83 ms, Last: 2.89 ms) ```
Eugene-Kuznetsov changed title from Improved Surface Deform performance to WIP: Improved Surface Deform performance 2023-12-28 23:17:38 +01:00
Eugene-Kuznetsov force-pushed ek_surface_deform from c51a967a0a to de4c975949 2023-12-30 13:23:31 +01:00 Compare
Eugene-Kuznetsov force-pushed ek_surface_deform from de4c975949 to b8d176fb2d 2024-01-06 14:00:04 +01:00 Compare
Eugene-Kuznetsov force-pushed ek_surface_deform from b8d176fb2d to f170c9cc88 2024-01-06 17:15:33 +01:00 Compare
Eugene-Kuznetsov force-pushed ek_surface_deform from f170c9cc88 to 52f5d04fca 2024-01-07 03:00:42 +01:00 Compare
Eugene-Kuznetsov force-pushed ek_surface_deform from 52f5d04fca to 4f91c770ce 2024-01-10 06:13:51 +01:00 Compare
Author
Contributor
Version fps, Threadripper fps, Ryzen
-- Female Horse Male Combined Female Horse Male Combined
main 1.1 2.5 1.0 0.7 0.85 1.90 0.76 0.25
#116545 13.4 3.6 7.7 2.6 11.7 2.5 6.5 1.8
#116545 + #116617 + #112256 + b8a86c3eac + 39b4d939ea 23.5 12.5 11.9 7.8 18.6 8.8 10.0 5.4

Test system 1: AMD Threadripper Pro 5955WX, Radeon RX7900, Linux
Test system 2: AMD Ryzen 7 5800X, Radeon RX6900, Windows 10

Version | fps, Threadripper | | | | fps, Ryzen | | | --- | --- |--- | --- |--- | ---------- |--- |--- | --- -- | Female | Horse | Male | Combined | Female | Horse | Male | Combined main | 1.1 | 2.5 | 1.0 | 0.7 | 0.85 | 1.90 | 0.76 | 0.25 #116545 | 13.4 | 3.6 | 7.7 | 2.6 | 11.7 | 2.5 | 6.5 | 1.8 #116545 + #116617 + #112256 + https://projects.blender.org/Eugene-Kuznetsov/blender/commit/b8a86c3eac530aa82773d1845fda9e70b9eb0f22 + https://projects.blender.org/Eugene-Kuznetsov/blender/commit/39b4d939ea5d02e0a977e18326ef136d1e67b75c | 23.5 | 12.5 | 11.9 | 7.8 | 18.6 | 8.8 | 10.0 | 5.4 Test system 1: AMD Threadripper Pro 5955WX, Radeon RX7900, Linux Test system 2: AMD Ryzen 7 5800X, Radeon RX6900, Windows 10
Member

Would you be interested in moving the preserve arguments for attribute access functions to a separate PR? Those seem easily separated from the other changes here, and I'd be interested how much of a difference they make.

Would you be interested in moving the `preserve` arguments for attribute access functions to a separate PR? Those seem easily separated from the other changes here, and I'd be interested how much of a difference they make.
Author
Contributor

Would you be interested in moving the preserve arguments for attribute access functions to a separate PR? Those seem easily separated from the other changes here, and I'd be interested how much of a difference they make.

Done (#117179)

> Would you be interested in moving the `preserve` arguments for attribute access functions to a separate PR? Those seem easily separated from the other changes here, and I'd be interested how much of a difference they make. Done (#117179)
Eugene-Kuznetsov force-pushed ek_surface_deform from 4f91c770ce to 6c11389b8a 2024-02-06 08:08:59 +01:00 Compare
Eugene-Kuznetsov changed title from WIP: Improved Surface Deform performance to Improved Surface Deform performance 2024-02-06 08:09:12 +01:00
Eugene-Kuznetsov force-pushed ek_surface_deform from 6c11389b8a to 00a785c977 2024-03-04 06:32:49 +01:00 Compare
Author
Contributor

Comparing against 2024/03/04 trunk (which includes the improvements from #118772):

Playback:

-- Female (no subdiv) Female (subdiv) Horse Male
Trunk 32 fps 14.5 fps 13.1 fps 9.2 fps
PR 49 fps 18.9 fps 15.4 fps 10.6 fps

Directly timing DeformCurvesOnSurface (second row #tris - original mesh, #tris - eval mesh, #curves)

Test Female (no subdiv) Female (subdiv) Horse Male
Parameters (21160, 21160, 104648) (21160, 84680, 104648) (74282, 74282, 1199404) (50312, 201248, 200045)
Trunk 15.8 ms 25.3 ms 40.9 ms 15.9 ms
PR 6.0 ms 10.5 ms 9.9 ms 4.8 ms
Comparing against 2024/03/04 trunk (which includes the improvements from https://projects.blender.org/blender/blender/pulls/118772): Playback: -- | Female (no subdiv) | Female (subdiv) | Horse | Male -- | -- | -- | -- | -- Trunk | 32 fps | 14.5 fps | 13.1 fps | 9.2 fps PR | 49 fps | 18.9 fps | 15.4 fps | 10.6 fps Directly timing DeformCurvesOnSurface (second row #tris - original mesh, #tris - eval mesh, #curves) Test | Female (no subdiv) | Female (subdiv) | Horse | Male -- | -- | -- | -- | -- Parameters | (21160, 21160, 104648) | (21160, 84680, 104648) | (74282, 74282, 1199404) | (50312, 201248, 200045) Trunk | 15.8 ms | 25.3 ms | 40.9 ms | 15.9 ms PR | 6.0 ms | 10.5 ms | 9.9 ms | 4.8 ms
Member

Thanks for the updated comparison. It's nice that we can still get a good speedup. Unfortunately, this patch is somewhat hard to act on because it contains so many different changes (at least it feels like it) and the patch description also doesn't really say what is changed specifically.

For the reverse uv sampler specifically, I'd strongly recommend to not put the different phases into a single loop with many if (phase == X) checks. That's very hard to follow.

Thanks for the updated comparison. It's nice that we can still get a good speedup. Unfortunately, this patch is somewhat hard to act on because it contains so many different changes (at least it feels like it) and the patch description also doesn't really say what is changed specifically. For the reverse uv sampler specifically, I'd strongly recommend to not put the different phases into a single loop with many `if (phase == X)` checks. That's very hard to follow.
Eugene-Kuznetsov force-pushed ek_surface_deform from 00a785c977 to aa8cecf53b 2024-03-04 21:16:49 +01:00 Compare
Author
Contributor

Thanks for the updated comparison. It's nice that we can still get a good speedup. Unfortunately, this patch is somewhat hard to act on because it contains so many different changes (at least it feels like it) and the patch description also doesn't really say what is changed specifically.

For the reverse uv sampler specifically, I'd strongly recommend to not put the different phases into a single loop with many if (phase == X) checks. That's very hard to follow.

Thanks. I've addressed the phases loop. One part of the PR is broken out as #117179, I am somewhat reluctant to spend any more time refactoring or documenting it, in view of this PR's age and history.

> Thanks for the updated comparison. It's nice that we can still get a good speedup. Unfortunately, this patch is somewhat hard to act on because it contains so many different changes (at least it feels like it) and the patch description also doesn't really say what is changed specifically. > > For the reverse uv sampler specifically, I'd strongly recommend to not put the different phases into a single loop with many `if (phase == X)` checks. That's very hard to follow. Thanks. I've addressed the phases loop. One part of the PR is broken out as https://projects.blender.org/blender/blender/pulls/117179, I am somewhat reluctant to spend any more time refactoring or documenting it, in view of this PR's age and history.
Member

Yeah fair. I can understand that it's a lot of work. Thanks for working on this. I'm sure some of the optimizations will slowly trickle into Blender over time.

Yeah fair. I can understand that it's a lot of work. Thanks for working on this. I'm sure some of the optimizations will slowly trickle into Blender over time.
Author
Contributor

It's not that it is a lot of work, it's just I don't want to do all the requested changes and then watch the PR sit in limbo for another 3 months.

I am going to break it out further if and when #117179 gets merged.

It's not that it is a lot of work, it's just I don't want to do all the requested changes and then watch the PR sit in limbo for another 3 months. I am going to break it out further if and when #117179 gets merged.

Some changes in this PR already skeptical, one of the reason to select all of them is to be able to discuss about each of them.

Some changes in this PR already skeptical, one of the reason to select all of them is to be able to discuss about each of them.
Eugene-Kuznetsov force-pushed ek_surface_deform from aa8cecf53b to c31d11220a 2024-04-02 07:18:37 +02:00 Compare
Eugene-Kuznetsov force-pushed ek_surface_deform from c31d11220a to 5c58f4374d 2024-05-29 02:56:58 +02:00 Compare
Eugene-Kuznetsov force-pushed ek_surface_deform from 5c58f4374d to f11ee08162 2024-05-29 02:57:44 +02:00 Compare
Merge conflict checking is in progress. Try again in few moments.

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u ek_surface_deform:Eugene-Kuznetsov-ek_surface_deform
git checkout Eugene-Kuznetsov-ek_surface_deform
Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#116545
No description provided.