Anim: run bezier handle calculation in parallel #119388

Merged
Christoph Lendenfeld merged 21 commits from ChrisLend/blender:thread_recalc_handles into main 2024-05-07 10:43:03 +02:00
1 changed files with 10 additions and 6 deletions
Showing only changes of commit c7c3cb13fd - Show all commits

View File

@ -1219,6 +1219,7 @@ static BezTriple *cycle_offset_triple(
void BKE_fcurve_handles_recalc_ex(FCurve *fcu, eBezTriple_Flag handle_sel_flag)
{
using namespace blender;
/* Error checking:
* - Need at least two points.
* - Need bezier keys.
@ -1231,13 +1232,13 @@ void BKE_fcurve_handles_recalc_ex(FCurve *fcu, eBezTriple_Flag handle_sel_flag)
}
/* If the first modifier is Cycles, smooth the curve through the cycle. */
BezTriple *first = &fcu->bezt[0], *last = &fcu->bezt[fcu->totvert - 1];
BezTriple tmp;
BezTriple *first = &fcu->bezt[0];
BezTriple *last = &fcu->bezt[fcu->totvert - 1];
const bool cycle = BKE_fcurve_is_cyclic(fcu) && BEZT_IS_AUTOH(first) && BEZT_IS_AUTOH(last);

Have you tried different grain sizes in your measurements? I feel like this should be much greater like 4096, but of course, feelings are meaningless when it comes to performance :D

Have you tried different grain sizes in your measurements? I feel like this should be much greater like `4096`, but of course, feelings are meaningless when it comes to performance :D

I did and surprisingly it got slower with 256 and 512.
But then again that's on a sample size of 1. Not sure if there are machine differences. And the difference wasn't huge, ~10% with quite a bit of noise

I did and surprisingly it got slower with 256 and 512. But then again that's on a sample size of 1. Not sure if there are machine differences. And the difference wasn't huge, ~10% with quite a bit of noise

Right, I guess there is more going on in this loop than it seems.

Right, I guess there is more going on in this loop than it seems.

I think 4096 is way too big for this use case. There typically aren't tens of thousands of keyframes, and the grain size needs to be low enough to give meaningful usage of multiple CPU cores.

I think 4096 is way too big for this use case. There typically aren't tens of thousands of keyframes, and the grain size needs to be low enough to give meaningful usage of multiple CPU cores.

where is the number 4096 coming from? hardware limits?

where is the number 4096 coming from? hardware limits?

@HooglyBoogly Well only if the loop is actually doing enough work to justify the parallel loop in the first place. But yes I agree.

@HooglyBoogly Well only if the loop is actually doing enough work to justify the parallel loop in the first place. But yes I agree.

Right, true

Right, true

@ChrisLend No just a higher power of two. Remember that the grain size is essentially "how many iterations are assigned to one thread". So in the case of 4096 it would (essentially) mean one thread is running 4096 iterations.

@ChrisLend No just a higher power of two. Remember that the grain size is essentially "how many iterations are assigned to one thread". So in the case of 4096 it would (essentially) mean one thread is running 4096 iterations.

where is the number 4096 coming from? hardware limits?

It's arbitrary, just a compromise between reducing overhead and more threading. I think I remember finding a while ago that using powers of two can actually give noticeable improvements. Also they just seem nice.

> where is the number 4096 coming from? hardware limits? It's arbitrary, just a compromise between reducing overhead and more threading. I think I remember finding a while ago that using powers of two can actually give noticeable improvements. Also they just seem nice.
blender::IndexRange bezt_range(0, fcu->totvert);
blender::threading::parallel_for(bezt_range, 512, [&](const blender::IndexRange range) {
IndexRange bezt_range(0, fcu->totvert);
threading::parallel_for(bezt_range, 512, [&](const IndexRange range) {
BezTriple tmp;
for (const int i : range) {
BezTriple *bezt = &fcu->bezt[i];
BezTriple *prev = nullptr;

How about pulling the special handling for the first and last keyframes out of the parallel loop, and skipping those two indices? bezt_range.drop_front(1).drop_back(1)

How about pulling the special handling for the first and last keyframes out of the parallel loop, and skipping those two indices? `bezt_range.drop_front(1).drop_back(1)`

To do that without duplicating the code within the loop I'd need to extract the logic into a function. Should I do this within this PR?

To do that without duplicating the code within the loop I'd need to extract the logic into a function. Should I do this within this PR?

Definitely not a big deal either way, it could wait if you preferred!

Definitely not a big deal either way, it could wait if you preferred!
@ -1251,6 +1252,9 @@ void BKE_fcurve_handles_recalc_ex(FCurve *fcu, eBezTriple_Flag handle_sel_flag)
if (i < fcu->totvert - 1) {
next = (bezt + 1);
}
else {
next = cycle_offset_triple(cycle, &tmp, &fcu->bezt[1], first, last);
}
/* Clamp timing of handles to be on either side of beztriple. */
if (bezt->vec[0][0] > bezt->vec[1][0]) {
@ -1266,7 +1270,7 @@ void BKE_fcurve_handles_recalc_ex(FCurve *fcu, eBezTriple_Flag handle_sel_flag)
/* For automatic ease in and out. */
if (BEZT_IS_AUTOH(bezt) && !cycle) {
/* Only do this on first or last beztriple. */
if (ELEM(fcu->totvert, 0, fcu->totvert - 1)) {
if (ELEM(i, 0, fcu->totvert - 1)) {
/* Set both handles to have same horizontal value as keyframe. */
if (fcu->extend == FCURVE_EXTRAPOLATE_CONSTANT) {
bezt->vec[0][1] = bezt->vec[2][1] = bezt->vec[1][1];