Anim: run bezier handle calculation in parallel #119388
|
@ -1219,6 +1219,7 @@ static BezTriple *cycle_offset_triple(
|
|||
|
||||
void BKE_fcurve_handles_recalc_ex(FCurve *fcu, eBezTriple_Flag handle_sel_flag)
|
||||
{
|
||||
using namespace blender;
|
||||
/* Error checking:
|
||||
* - Need at least two points.
|
||||
* - Need bezier keys.
|
||||
|
@ -1231,13 +1232,13 @@ void BKE_fcurve_handles_recalc_ex(FCurve *fcu, eBezTriple_Flag handle_sel_flag)
|
|||
}
|
||||
|
||||
/* If the first modifier is Cycles, smooth the curve through the cycle. */
|
||||
BezTriple *first = &fcu->bezt[0], *last = &fcu->bezt[fcu->totvert - 1];
|
||||
BezTriple tmp;
|
||||
|
||||
BezTriple *first = &fcu->bezt[0];
|
||||
BezTriple *last = &fcu->bezt[fcu->totvert - 1];
|
||||
const bool cycle = BKE_fcurve_is_cyclic(fcu) && BEZT_IS_AUTOH(first) && BEZT_IS_AUTOH(last);
|
||||
|
||||
|
||||
blender::IndexRange bezt_range(0, fcu->totvert);
|
||||
blender::threading::parallel_for(bezt_range, 512, [&](const blender::IndexRange range) {
|
||||
IndexRange bezt_range(0, fcu->totvert);
|
||||
threading::parallel_for(bezt_range, 512, [&](const IndexRange range) {
|
||||
BezTriple tmp;
|
||||
for (const int i : range) {
|
||||
BezTriple *bezt = &fcu->bezt[i];
|
||||
BezTriple *prev = nullptr;
|
||||
Hans Goudey
commented
How about pulling the special handling for the first and last keyframes out of the parallel loop, and skipping those two indices? How about pulling the special handling for the first and last keyframes out of the parallel loop, and skipping those two indices? `bezt_range.drop_front(1).drop_back(1)`
To do that without duplicating the code within the loop I'd need to extract the logic into a function. Should I do this within this PR? To do that without duplicating the code within the loop I'd need to extract the logic into a function. Should I do this within this PR?
Hans Goudey
commented
Definitely not a big deal either way, it could wait if you preferred! Definitely not a big deal either way, it could wait if you preferred!
|
||||
|
@ -1251,6 +1252,9 @@ void BKE_fcurve_handles_recalc_ex(FCurve *fcu, eBezTriple_Flag handle_sel_flag)
|
|||
if (i < fcu->totvert - 1) {
|
||||
next = (bezt + 1);
|
||||
}
|
||||
else {
|
||||
next = cycle_offset_triple(cycle, &tmp, &fcu->bezt[1], first, last);
|
||||
}
|
||||
|
||||
/* Clamp timing of handles to be on either side of beztriple. */
|
||||
if (bezt->vec[0][0] > bezt->vec[1][0]) {
|
||||
|
@ -1266,7 +1270,7 @@ void BKE_fcurve_handles_recalc_ex(FCurve *fcu, eBezTriple_Flag handle_sel_flag)
|
|||
/* For automatic ease in and out. */
|
||||
if (BEZT_IS_AUTOH(bezt) && !cycle) {
|
||||
/* Only do this on first or last beztriple. */
|
||||
if (ELEM(fcu->totvert, 0, fcu->totvert - 1)) {
|
||||
if (ELEM(i, 0, fcu->totvert - 1)) {
|
||||
/* Set both handles to have same horizontal value as keyframe. */
|
||||
if (fcu->extend == FCURVE_EXTRAPOLATE_CONSTANT) {
|
||||
bezt->vec[0][1] = bezt->vec[2][1] = bezt->vec[1][1];
|
||||
|
|
Loading…
Reference in New Issue
Have you tried different grain sizes in your measurements? I feel like this should be much greater like
4096
, but of course, feelings are meaningless when it comes to performance :DI did and surprisingly it got slower with 256 and 512.
But then again that's on a sample size of 1. Not sure if there are machine differences. And the difference wasn't huge, ~10% with quite a bit of noise
Right, I guess there is more going on in this loop than it seems.
I think 4096 is way too big for this use case. There typically aren't tens of thousands of keyframes, and the grain size needs to be low enough to give meaningful usage of multiple CPU cores.
where is the number 4096 coming from? hardware limits?
@HooglyBoogly Well only if the loop is actually doing enough work to justify the parallel loop in the first place. But yes I agree.
Right, true
@ChrisLend No just a higher power of two. Remember that the grain size is essentially "how many iterations are assigned to one thread". So in the case of 4096 it would (essentially) mean one thread is running 4096 iterations.
It's arbitrary, just a compromise between reducing overhead and more threading. I think I remember finding a while ago that using powers of two can actually give noticeable improvements. Also they just seem nice.