Anim: thread remake_graph_transdata #119497

Christoph Lendenfeld · 2024-03-15T10:20:13+01:00

2024-03-15 10:20:13 +01:00

On animations with high key counts, remake_graph_transdata takes most of the compute time when moving keys.
This patch threads the loop over FCurves in that function to speed things up.

Test file with 10.000 keys per F-Curve

-	Before	After
Moving 1 key of each FCurve	~2200ms	~285ms
Moving a single key	~0.70ms	~0.72ms

As demonstrated in the measurements, this speeds up the case of modifying a lot of data, while not impacting the case of modifying very little data.
The measurements were taken on an 8c/16t CPU. The higher the thread count, the better the performance gain.

Measurements of remake_graph_transdata using the following test file.
https://download.blender.org/ftp/sybren/animation-rigging/heavy_mocap_test.blend

For the review:

~~Is it needed to exclude testhandles_fcurve from the threading? Is it an issue if we create threads within threads?~~
Answer by @HooglyBoogly on Blender Chat: No, that's done all the time, and it's a great way to achieve more parallelism. It's a good way to cover both 10000 different FCurves with 4 points each, and 4 FCurves with 10000 points each in the same code
Doing this gave an additional speedup (grain size 1) from 330ms to 275ms.

Grain Size Tests
Note: those tests were done before moving testhandles_fcurve into the threaded loop

Grain Size	Time
1	~330ms
4	~334ms
16	~338ms
64	~ 355ms
256	~865ms
1024	~1630ms

On animations with high key counts, `remake_graph_transdata` takes most of the compute time when moving keys. This patch threads the loop over FCurves in that function to speed things up. Test file with 10.000 keys per F-Curve | - | Before | After | | - | - | - | | Moving 1 key of each FCurve | ~2200ms | ~285ms | | Moving a single key | ~0.70ms | ~0.72ms | As demonstrated in the measurements, this speeds up the case of modifying a lot of data, while not impacting the case of modifying very little data. The measurements were taken on an 8c/16t CPU. The higher the thread count, the better the performance gain. Measurements of `remake_graph_transdata` using the following test file. https://download.blender.org/ftp/sybren/animation-rigging/heavy_mocap_test.blend ------ For the review: * ~~Is it needed to exclude `testhandles_fcurve` from the threading? Is it an issue if we create threads within threads?~~ Answer by @HooglyBoogly on Blender Chat: `No, that's done all the time, and it's a great way to achieve more parallelism. It's a good way to cover both 10000 different FCurves with 4 points each, and 4 FCurves with 10000 points each in the same code` Doing this gave an additional speedup (grain size 1) from 330ms to 275ms. **Grain Size Tests** Note: those tests were done before moving `testhandles_fcurve` into the threaded loop | Grain Size | Time | | - | - | | 1 | ~330ms | | 4 | ~334ms | | 16 | ~338ms | | 64 | ~ 355ms | | 256 | ~865ms | | 1024 | ~1630ms |

🎉 1 🚀 2

Christoph Lendenfeld added the

Module

Animation & Rigging

label 2024-03-15 10:20:13 +01:00

Christoph Lendenfeld added 1 commit 2024-03-15 10:20:25 +01:00

threading 5d04b38656

Christoph Lendenfeld added 2 commits 2024-03-21 11:54:04 +01:00

Merge branch 'main' into thread_remake_transdata 69b52d6176

cleanup: remove timer 79ae2eb5a7

Christoph Lendenfeld added 1 commit 2024-03-21 12:02:21 +01:00

pass Span to remake function 057ce3794f

Christoph Lendenfeld added 1 commit 2024-03-26 10:03:46 +01:00

Merge branch 'main' into thread_remake_transdata c97c6750b2

Christoph Lendenfeld added 1 commit 2024-03-26 10:19:04 +01:00

change grain_size to 1 8206a0edec

Christoph Lendenfeld requested review from Sybren A. Stüvel 2024-03-28 14:53:58 +01:00

Christoph Lendenfeld requested review from Hans Goudey 2024-03-28 14:54:05 +01:00

Falk David reviewed 2024-03-28 14:59:40 +01:00

source/blender/editors/transform/transform_convert_graph.cc Outdated

						
				@ -893,3 +893,1 @@

				  for (FCurve *fcu : fcurves) {

				    if (fcu->bezt) {

				      BeztMap *bezm;

				  blender::threading::parallel_for(fcurves.index_range(), 1, [&](const blender::IndexRange range) {

Falk David commented

2024-03-28 14:59:40 +01:00

Looks like you don't need the index here, so it might be better to use parallel_for_each. Of course, that's only if you actually want a grain size of one.

Looks like you don't need the index here, so it might be better to use `parallel_for_each`. Of course, that's only if you actually want a grain size of one.

Iliya Katushenock commented

2024-03-28 15:01:10 +01:00

Even with grain size 1, range can have any size. But i wonder if this will be better to test parallel_for_weighted here.

Even with grain size `1`, `range` can have any size. But i wonder if this will be better to test `parallel_for_weighted` here.

Hans Goudey commented

2024-03-28 15:07:44 +01:00

parallel_for_each uses a different algorithm internally that has more overhead. AFAIK it will always give each element its own thread. I think it's typically only suitable for much more expensive tasks.

`parallel_for_each` uses a different algorithm internally that has more overhead. AFAIK it will always give each element its own thread. I think it's typically only suitable for much more expensive tasks.

dr.sybren marked this conversation as resolved

Christoph Lendenfeld added 2 commits 2024-03-28 15:31:33 +01:00

Merge branch 'main' into thread_remake_transdata 575934aa71

remove timer 5ecdc9983a

Christoph Lendenfeld added 1 commit 2024-03-28 15:35:03 +01:00

put testhandles into threading function 2d17296f72

Hans Goudey approved these changes 2024-03-28 16:07:47 +01:00

Hans Goudey left a comment

How about a grain size of 8?

👍 1

Christoph Lendenfeld added 2 commits 2024-04-02 10:18:14 +02:00

Merge branch 'main' into thread_remake_transdata d7f4d3c165

set grain size to 8 b1cf628715

Sybren A. Stüvel approved these changes 2024-04-09 11:26:53 +02:00

Sybren A. Stüvel left a comment

Nice work! Just two small notes that can be addressed when landing.

source/blender/editors/transform/transform_convert_graph.cc Outdated

						
				@ -893,3 +892,1 @@

				  for (FCurve *fcu : fcurves) {

				    if (fcu->bezt) {

				      BeztMap *bezm;

				  blender::threading::parallel_for(fcurves.index_range(), 8, [&](const blender::IndexRange range) {

Sybren A. Stüvel commented

2024-04-09 11:14:44 +02:00

Add a comment that explains how the 8 got here.

source/blender/editors/transform/transform_convert_graph.cc

						
				@ -899,3 +896,1 @@

				      bezm = bezt_to_beztmaps(fcu->bezt, fcu->totvert);

				      sort_time_beztmaps(bezm, fcu->totvert);

				      beztmap_to_data(t, fcu, bezm, fcu->totvert);

				      if (fcu->bezt) {

Sybren A. Stüvel commented

2024-04-09 11:15:39 +02:00

if (!fcu->bezt) { continue; }

Gitea already doesn't present this as "it's all the same, just indented", so I think it's fine to restructure the code a bit further for readability.

`if (!fcu->bezt) { continue; }` Gitea already doesn't present this as "it's all the same, just indented", so I think it's fine to restructure the code a bit further for readability.

👍 1

Christoph Lendenfeld added 2 commits 2024-04-09 11:42:37 +02:00

Merge branch 'main' into thread_remake_transdata 5bda46b17c

unindent and add comment bf22bfc53e

Christoph Lendenfeld merged commit 8a79212031 into main

2024-04-09 11:46:17 +02:00

Christoph Lendenfeld referenced this issue from a commit

2024-04-09 11:46:18 +02:00

Anim: thread remake_graph_transdata

Christoph Lendenfeld deleted branch thread_remake_transdata

2024-04-09 11:46:19 +02:00

Raul Fernandez Hernandez referenced this issue from a commit