Subdiv: Use index instead of pointer for grid to face map #111078

Hans Goudey · 2023-08-12T23:00:57+02:00

Hans Goudey commented

2023-08-12 23:00:57 +02:00

Switch from face pointers to indices, with the following benefits:

Halve memory usage required for array (saves 16 MB with 1 million quads).
Allow better code reuse with index-based utilities.
Ease future replacement of SubdivCCGFace with mesh's face offsets.
Allow future replacement of array with existing mesh corner to face cache
(this array is the same as the one created by build_loop_to_face_map).
Potentially lower memory bandwidth required during multires evaluation
Simplify retrieval of index in BKE_subdiv_ccg_grid_to_face_index.

For clarity, gridfaces was renamed to grid_to_face_map.

Switch from face pointers to indices, with the following benefits: - Halve memory usage required for array (saves 16 MB with 1 million quads). - Allow better code reuse with index-based utilities. - Ease future replacement of `SubdivCCGFace` with mesh's face offsets. - Allow future replacement of array with existing mesh corner to face cache (this array is the same as the one created by `build_loop_to_face_map`). - Potentially lower memory bandwidth required during multires evaluation - Simplify retrieval of index in `BKE_subdiv_ccg_grid_to_face_index`. For clarity, `gridfaces` was renamed to `grid_to_face_map`.

Hans Goudey added 1 commit 2023-08-12 23:01:05 +02:00

buildbot/vexp-code-patch-coordinator Build done. Details

d742ead9d8 Subdiv: Use index instead of pointer for grid to face map

Halves the memory usage required for the `grid_faces` array. Using
indices should reduce required memory bandwidth, and allows better code
reuse with common utilities that are index-based. This map is the same
as the one created by `build_loop_to_face_map` for meshes, so the memory
should be shared in the future. The existing pointer array is often used with
subtraction to get the index anyway.

Hans Goudey requested review from Sergey Sharybin 2023-08-12 23:01:46 +02:00

Hans Goudey commented

2023-08-12 23:01:54 +02:00

@blender-bot build

Sergey Sharybin reviewed 2023-08-14 11:48:08 +02:00

Sergey Sharybin left a comment

Lower memory footprint does sound appealing, but to me it is not obvious how the motivational part about lower bandwidth and the index-based common utilities is applicable here.

For the former, there is now extra indirection, so the bandwidth is actually higher? (first you fetch index, then you fetch the actual face). Did you run benchamrks to verify the bandwidth is indeed lowered and is translated to user-measurable performance?

For the former, I am not sure which index-based utilities you're talking about. In almost all uses both number of girds per face and offset from start are sued.

Also, what is the practical problem this patch is solving?

Lower memory footprint does sound appealing, but to me it is not obvious how the motivational part about lower bandwidth and the index-based common utilities is applicable here. For the former, there is now extra indirection, so the bandwidth is actually higher? (first you fetch index, then you fetch the actual face). Did you run benchamrks to verify the bandwidth is indeed lowered and is translated to user-measurable performance? For the former, I am not sure which index-based utilities you're talking about. In almost all uses both number of girds per face and offset from start are sued. Also, what is the practical problem this patch is solving?

source/blender/blenkernel/intern/pbvh.cc Outdated

						
				@ -2845,6 +2846,7 @@ bool BKE_pbvh_node_frustum_exclude_AABB(PBVHNode *node, void *data)

				void BKE_pbvh_update_normals(PBVH *pbvh, SubdivCCG *subdiv_ccg)

				{

				  using namespace blender;

Sergey Sharybin commented

2023-08-14 11:38:16 +02:00

Not sure if/why this is needed.

HooglyBoogly marked this conversation as resolved

Hans Goudey commented

2023-08-29 05:00:08 +02:00

For the former, there is now extra indirection, so the bandwidth is actually higher? (first you fetch index, then you fetch the actual face). Did you run benchamrks to verify the bandwidth is indeed lowered and is translated to user-measurable performance?

I'm pretty sure the required memory is lower:

Before: The face pointer is loaded: 8 bytes, then de-referenced: 8 bytes
After: The index is loaded: 4 bytes, then after a pointer offset, the face is de-referenced: 8 bytes

The face array's base pointer is the same for every face, so I'm not counting that. I didn't run any performance tests because I think there are way worse bottlenecks in multires/sculpt code still, I don't guess this will make an observable difference by itself.

For the former, I am not sure which index-based utilities you're talking about. In almost all uses both number of girds per face and offset from start are sued.

I'm pretty sure SubdivCCGFace *faces in SubdivCCG is redundant with the base mesh's Mesh::faces() offset indices. Ideally I'd like to avoid duplicating the mesh face offsets. Or at least we should be able to replace SubdivCCGFace with OffsetIndices as well (since face corners and faces have the same order nowadays). Those data structures interact with indices, not pointers.

Also, what is the practical problem this patch is solving?

I wanted to use an IndexMask to replace arrays of pointers like effected_ccg_faces. The easiest/best way to interact with that data structure is indices. It keeps the selections potentially much smaller in memory, and makes them more generic-- easier to operate with from more code.

> For the former, there is now extra indirection, so the bandwidth is actually higher? (first you fetch index, then you fetch the actual face). Did you run benchamrks to verify the bandwidth is indeed lowered and is translated to user-measurable performance? I'm pretty sure the required memory is lower: - **Before**: The face pointer is loaded: 8 bytes, then de-referenced: 8 bytes - **After**: The index is loaded: 4 bytes, then after a pointer offset, the face is de-referenced: 8 bytes The face array's base pointer is the same for every face, so I'm not counting that. I didn't run any performance tests because I think there are way worse bottlenecks in multires/sculpt code still, I don't guess this will make an observable difference by itself. > For the former, I am not sure which index-based utilities you're talking about. In almost all uses both number of girds per face and offset from start are sued. I'm pretty sure `SubdivCCGFace *faces` in `SubdivCCG` is redundant with the base mesh's `Mesh::faces()` offset indices. Ideally I'd like to avoid duplicating the mesh face offsets. Or at least we should be able to replace `SubdivCCGFace` with `OffsetIndices` as well (since face corners and faces have the same order nowadays). Those data structures interact with indices, not pointers. > Also, what is the practical problem this patch is solving? I wanted to use an `IndexMask` to replace arrays of pointers like `effected_ccg_faces`. The easiest/best way to interact with that data structure is indices. It keeps the selections potentially much smaller in memory, and makes them more generic-- easier to operate with from more code.

Hans Goudey added this to the Sculpt, Paint & Texture project 2023-08-29 05:00:59 +02:00

Hans Goudey added 2 commits 2023-08-29 05:01:32 +02:00

4e4d969c0f Merge branch 'main' into cleanup-grid-faces-int

e9d3795403 Remove unnecessary `using namespace blender;`

Sergey Sharybin commented

2023-08-29 10:29:16 +02:00

I'm pretty sure the required memory is lower:

Before: The face pointer is loaded: 8 bytes, then de-referenced: 8 bytes

After: The index is loaded: 4 bytes, then after a pointer offset, the face is de-referenced: 8 bytes

I am confused.

Code before: SubdivCCGFace *face = subdiv_ccg->grid_faces[coord->grid_index];
Code after: const SubdivCCGFace *face = &subdiv_ccg->faces[subdiv_ccg->grid_faces[coord->grid_index]];

So from the perspective of getting a SubdivCCGFace *face pointer:

Before: dereference coord to access grid_index, use that to calculate offset from the subdiv_ccg->grid_faces. is it 8+4+8 bytes fetch?
After: dereference coord to access grid_index, use that to calculate offset from the subdiv_ccg->grid_faces, and use that to calculate offset from the subdiv_ccg->faces. Is it 8+4+8+8 bytes fetch?

Sounds like we are counting things differently, and would be nice to align in that so that we measure the same thing :)

I wanted to use an IndexMask to replace arrays of pointers like effected_ccg_faces.

That is much cleaner motivational part!

Goal of replacing SubdivCCGFace / effected_ccg_faces with something that is used in other places (assuming, without sacrificing performance/bandwidth) makes me more excited about digging into this change than unclear (to me) memory bandwidth optimizations in something that is probably CPU bound!

The easiest/best way to interact with that data structure is indices. It keeps the selections potentially much smaller in memory, and makes them more generic-- easier to operate with from more code.

The code operates on grids, not on indices though. And this indirection is not something obviously gooder (see the first part of the message).

P.S. The entirety of the CCG/Subdiv might need to be revisited on a bigger scale, But i guess it is still good you digging the code to understand its limits/weaknesses before possibly revising it on a bigger API level.
P.P.S. Would it help if we move SubdivCCG to a non-trivial C++ class early on?

> I'm pretty sure the required memory is lower: > > * Before: The face pointer is loaded: 8 bytes, then de-referenced: 8 bytes > * After: The index is loaded: 4 bytes, then after a pointer offset, the face is de-referenced: 8 bytes I am confused. ``` Code before: SubdivCCGFace *face = subdiv_ccg->grid_faces[coord->grid_index]; Code after: const SubdivCCGFace *face = &subdiv_ccg->faces[subdiv_ccg->grid_faces[coord->grid_index]]; ``` So from the perspective of getting a `SubdivCCGFace *face` pointer: * Before: dereference `coord` to access `grid_index`, use that to calculate offset from the `subdiv_ccg->grid_faces`. is it 8+4+8 bytes fetch? * After: dereference `coord` to access `grid_index`, use that to calculate offset from the `subdiv_ccg->grid_faces`, and use that to calculate offset from the `subdiv_ccg->faces`. Is it 8+4+8+8 bytes fetch? Sounds like we are counting things differently, and would be nice to align in that so that we measure the same thing :) > I wanted to use an `IndexMask` to replace arrays of pointers like `effected_ccg_faces`. That is much cleaner motivational part! Goal of replacing `SubdivCCGFace` / `effected_ccg_faces` with something that is used in other places (assuming, without sacrificing performance/bandwidth) makes me more excited about digging into this change than unclear (to me) memory bandwidth optimizations in something that is probably CPU bound! > The easiest/best way to interact with that data structure is indices. It keeps the selections potentially much smaller in memory, and makes them more generic-- easier to operate with from more code. The code operates on grids, not on indices though. And this indirection is not something obviously gooder (see the first part of the message). P.S. The entirety of the CCG/Subdiv might need to be revisited on a bigger scale, But i guess it is still good you digging the code to understand its limits/weaknesses before possibly revising it on a bigger API level. P.P.S. Would it help if we move `SubdivCCG` to a non-trivial C++ class early on?

Hans Goudey commented

2023-08-29 23:40:35 +02:00

Sounds like we are counting things differently, and would be nice to align in that so that we measure the same thing :)

Hmm, yeah, unless I'm being very stupid, I think I'm not counting the two pointers subdiv_ccg->grid_faces and subdiv_ccg->faces in the index bandwidth calculation. Reasoning is that those pointers are the same for every grid and face, so they will already be loaded and cached.

Would it help if we move SubdivCCG to a non-trivial C++ class early on?

Yeah, good idea! I'll try to commit that separately soon.

But i guess it is still good you digging the code to understand its limits/weaknesses before possibly revising it on a bigger API level.

Yeah, I realize my work in this area is pretty random at this point. I'm trying to tackle low hanging fruit while getting to know the area better. It's a good opportunity for us to get aligned on how to talk about this stuff too (example, this PR!). I'm also working in #111675, but I'd imagine working on the topology refiner stuff a bit later, which is where I hope we can avoid some of the overhead of accessing the relatively simple mesh data.

>Sounds like we are counting things differently, and would be nice to align in that so that we measure the same thing :) Hmm, yeah, unless I'm being very stupid, I think I'm not counting the two pointers `subdiv_ccg->grid_faces` and `subdiv_ccg->faces` in the index bandwidth calculation. Reasoning is that those pointers are the same for every grid and face, so they will already be loaded and cached. >Would it help if we move SubdivCCG to a non-trivial C++ class early on? Yeah, good idea! I'll try to commit that separately soon. >But i guess it is still good you digging the code to understand its limits/weaknesses before possibly revising it on a bigger API level. Yeah, I realize my work in this area is pretty random at this point. I'm trying to tackle low hanging fruit while getting to know the area better. It's a good opportunity for us to get aligned on how to talk about this stuff too (example, this PR!). I'm also working in #111675, but I'd imagine working on the topology refiner stuff a bit later, which is where I hope we can avoid some of the overhead of accessing the relatively simple mesh data.

Sergey Sharybin commented

2023-08-30 09:44:33 +02:00

Reasoning is that those pointers are the same for every grid and face, so they will already be loaded and cached.

That's a rabbit hole! But I see your point.
I am not immediately sure how the total throughputs will compare. Not even sure how much time in such calculations we want to be spending.

Can see it from this point of view:

The memory bandwidth in this case is not known to be a bottleneck
The patch reduces the overall size of SubdivCCG, and memory footprint of high-poly subdivides surface is known to be problematic
The patch also avoids untyped void** pointers in the API

So moving forward I think it is more matter of pivoting the presentation/motivation in the PR description (which resembles commit message): don't focus too much on bandwidth but on the API improvements and overall memory footprint reduction. Those are hard to argue with :)

On the code the main concern I have is naming. To me it is important to clearly distinguish face itself and face index. Another relevant aspect is that in a hindsight I don't think name grid_faces is clear enough. Taking these two points into account the better name would be face_index_of_grid. Something like that. It is a bigger number of lines changed though, but I believe it worth it. And, also, I am open for suggestions if you can think of a better name.

> Reasoning is that those pointers are the same for every grid and face, so they will already be loaded and cached. That's a rabbit hole! But I see your point. I am not immediately sure how the total throughputs will compare. Not even sure how much time in such calculations we want to be spending. Can see it from this point of view: - The memory bandwidth in this case is not known to be a bottleneck - The patch reduces the overall size of `SubdivCCG`, and memory footprint of high-poly subdivides surface is known to be problematic - The patch also avoids untyped `void**` pointers in the API So moving forward I think it is more matter of pivoting the presentation/motivation in the PR description (which resembles commit message): don't focus too much on bandwidth but on the API improvements and overall memory footprint reduction. Those are hard to argue with :) On the code the main concern I have is naming. To me it is important to clearly distinguish face itself and face index. Another relevant aspect is that in a hindsight I don't think name `grid_faces` is clear enough. Taking these two points into account the better name would be `face_index_of_grid`. Something like that. It is a bigger number of lines changed though, but I believe it worth it. And, also, I am open for suggestions if you can think of a better name.

👍 1

Hans Goudey added 3 commits 2023-08-30 17:37:57 +02:00

1d05ac6ca7 Merge branch 'main' into cleanup-grid-faces-int

8570662afd Rename to grid_to_face_map

eae866b46b Merge branch 'main' into cleanup-grid-faces-int

Sergey Sharybin approved these changes 2023-08-31 17:40:29 +02:00

Sergey Sharybin left a comment

Something we briefly covered in the chat is the grid_to_face_map vs. grid_to_face_index_map.

Personally I'd choose the latter, but there are also arguments to use the former (i.e. similar naming convention in the Mesh).

In this particular case I am not too much emotionally attached to either of them, more like a preference.

Thanks for updating the description and taking care of a bigger scope renaming!

Something we briefly covered in the chat is the `grid_to_face_map` vs. `grid_to_face_index_map`. Personally I'd choose the latter, but there are also arguments to use the former (i.e. similar naming convention in the `Mesh`). In this particular case I am not too much emotionally attached to either of them, more like a preference. Thanks for updating the description and taking care of a bigger scope renaming!

Hans Goudey merged commit 2228006d5b into main

2023-08-31 19:40:46 +02:00

Hans Goudey referenced this issue from a commit

2023-08-31 19:40:47 +02:00

Subdiv: Use index instead of pointer for grid to face map

Hans Goudey deleted branch cleanup-grid-faces-int

2023-08-31 19:40:52 +02:00

Stefan Werner referenced this issue from a commit

2023-09-07 15:02:37 +02:00

Subdiv: Use index instead of pointer for grid to face map

Sign in to join this conversation.

No reviewers

No Label

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Subdiv: Use index instead of pointer for grid to face map #111078