Subdiv: Use index instead of pointer for grid to face map #111078

Merged
Hans Goudey merged 6 commits from HooglyBoogly/blender:cleanup-grid-faces-int into main 2023-08-31 19:40:46 +02:00
Member

Switch from face pointers to indices, with the following benefits:

  • Halve memory usage required for array (saves 16 MB with 1 million quads).
  • Allow better code reuse with index-based utilities.
  • Ease future replacement of SubdivCCGFace with mesh's face offsets.
  • Allow future replacement of array with existing mesh corner to face cache
    (this array is the same as the one created by build_loop_to_face_map).
  • Potentially lower memory bandwidth required during multires evaluation
  • Simplify retrieval of index in BKE_subdiv_ccg_grid_to_face_index.

For clarity, gridfaces was renamed to grid_to_face_map.

Switch from face pointers to indices, with the following benefits: - Halve memory usage required for array (saves 16 MB with 1 million quads). - Allow better code reuse with index-based utilities. - Ease future replacement of `SubdivCCGFace` with mesh's face offsets. - Allow future replacement of array with existing mesh corner to face cache (this array is the same as the one created by `build_loop_to_face_map`). - Potentially lower memory bandwidth required during multires evaluation - Simplify retrieval of index in `BKE_subdiv_ccg_grid_to_face_index`. For clarity, `gridfaces` was renamed to `grid_to_face_map`.
Hans Goudey added 1 commit 2023-08-12 23:01:05 +02:00
buildbot/vexp-code-patch-coordinator Build done. Details
d742ead9d8
Subdiv: Use index instead of pointer for grid to face map
Halves the memory usage required for the `grid_faces` array. Using
indices should reduce required memory bandwidth, and allows better code
reuse with common utilities that are index-based. This map is the same
as the one created by `build_loop_to_face_map` for meshes, so the memory
should be shared in the future. The existing pointer array is often used with
subtraction to get the index anyway.
Hans Goudey requested review from Sergey Sharybin 2023-08-12 23:01:46 +02:00
Author
Member

@blender-bot build

@blender-bot build
Sergey Sharybin reviewed 2023-08-14 11:48:08 +02:00
Sergey Sharybin left a comment
Owner

Lower memory footprint does sound appealing, but to me it is not obvious how the motivational part about lower bandwidth and the index-based common utilities is applicable here.

For the former, there is now extra indirection, so the bandwidth is actually higher? (first you fetch index, then you fetch the actual face). Did you run benchamrks to verify the bandwidth is indeed lowered and is translated to user-measurable performance?

For the former, I am not sure which index-based utilities you're talking about. In almost all uses both number of girds per face and offset from start are sued.

Also, what is the practical problem this patch is solving?

Lower memory footprint does sound appealing, but to me it is not obvious how the motivational part about lower bandwidth and the index-based common utilities is applicable here. For the former, there is now extra indirection, so the bandwidth is actually higher? (first you fetch index, then you fetch the actual face). Did you run benchamrks to verify the bandwidth is indeed lowered and is translated to user-measurable performance? For the former, I am not sure which index-based utilities you're talking about. In almost all uses both number of girds per face and offset from start are sued. Also, what is the practical problem this patch is solving?
@ -2845,6 +2846,7 @@ bool BKE_pbvh_node_frustum_exclude_AABB(PBVHNode *node, void *data)
void BKE_pbvh_update_normals(PBVH *pbvh, SubdivCCG *subdiv_ccg)
{
using namespace blender;

Not sure if/why this is needed.

Not sure if/why this is needed.
HooglyBoogly marked this conversation as resolved
Author
Member

For the former, there is now extra indirection, so the bandwidth is actually higher? (first you fetch index, then you fetch the actual face). Did you run benchamrks to verify the bandwidth is indeed lowered and is translated to user-measurable performance?

I'm pretty sure the required memory is lower:

  • Before: The face pointer is loaded: 8 bytes, then de-referenced: 8 bytes
  • After: The index is loaded: 4 bytes, then after a pointer offset, the face is de-referenced: 8 bytes

The face array's base pointer is the same for every face, so I'm not counting that. I didn't run any performance tests because I think there are way worse bottlenecks in multires/sculpt code still, I don't guess this will make an observable difference by itself.

For the former, I am not sure which index-based utilities you're talking about. In almost all uses both number of girds per face and offset from start are sued.

I'm pretty sure SubdivCCGFace *faces in SubdivCCG is redundant with the base mesh's Mesh::faces() offset indices. Ideally I'd like to avoid duplicating the mesh face offsets. Or at least we should be able to replace SubdivCCGFace with OffsetIndices as well (since face corners and faces have the same order nowadays). Those data structures interact with indices, not pointers.

Also, what is the practical problem this patch is solving?

I wanted to use an IndexMask to replace arrays of pointers like effected_ccg_faces. The easiest/best way to interact with that data structure is indices. It keeps the selections potentially much smaller in memory, and makes them more generic-- easier to operate with from more code.

> For the former, there is now extra indirection, so the bandwidth is actually higher? (first you fetch index, then you fetch the actual face). Did you run benchamrks to verify the bandwidth is indeed lowered and is translated to user-measurable performance? I'm pretty sure the required memory is lower: - **Before**: The face pointer is loaded: 8 bytes, then de-referenced: 8 bytes - **After**: The index is loaded: 4 bytes, then after a pointer offset, the face is de-referenced: 8 bytes The face array's base pointer is the same for every face, so I'm not counting that. I didn't run any performance tests because I think there are way worse bottlenecks in multires/sculpt code still, I don't guess this will make an observable difference by itself. > For the former, I am not sure which index-based utilities you're talking about. In almost all uses both number of girds per face and offset from start are sued. I'm pretty sure `SubdivCCGFace *faces` in `SubdivCCG` is redundant with the base mesh's `Mesh::faces()` offset indices. Ideally I'd like to avoid duplicating the mesh face offsets. Or at least we should be able to replace `SubdivCCGFace` with `OffsetIndices` as well (since face corners and faces have the same order nowadays). Those data structures interact with indices, not pointers. > Also, what is the practical problem this patch is solving? I wanted to use an `IndexMask` to replace arrays of pointers like `effected_ccg_faces`. The easiest/best way to interact with that data structure is indices. It keeps the selections potentially much smaller in memory, and makes them more generic-- easier to operate with from more code.
Hans Goudey added this to the Sculpt, Paint & Texture project 2023-08-29 05:00:59 +02:00
Hans Goudey added 2 commits 2023-08-29 05:01:32 +02:00

I'm pretty sure the required memory is lower:

  • Before: The face pointer is loaded: 8 bytes, then de-referenced: 8 bytes
  • After: The index is loaded: 4 bytes, then after a pointer offset, the face is de-referenced: 8 bytes

I am confused.

Code before: SubdivCCGFace *face = subdiv_ccg->grid_faces[coord->grid_index];
Code after: const SubdivCCGFace *face = &subdiv_ccg->faces[subdiv_ccg->grid_faces[coord->grid_index]];

So from the perspective of getting a SubdivCCGFace *face pointer:

  • Before: dereference coord to access grid_index, use that to calculate offset from the subdiv_ccg->grid_faces. is it 8+4+8 bytes fetch?
  • After: dereference coord to access grid_index, use that to calculate offset from the subdiv_ccg->grid_faces, and use that to calculate offset from the subdiv_ccg->faces. Is it 8+4+8+8 bytes fetch?

Sounds like we are counting things differently, and would be nice to align in that so that we measure the same thing :)

I wanted to use an IndexMask to replace arrays of pointers like effected_ccg_faces.

That is much cleaner motivational part!

Goal of replacing SubdivCCGFace / effected_ccg_faces with something that is used in other places (assuming, without sacrificing performance/bandwidth) makes me more excited about digging into this change than unclear (to me) memory bandwidth optimizations in something that is probably CPU bound!

The easiest/best way to interact with that data structure is indices. It keeps the selections potentially much smaller in memory, and makes them more generic-- easier to operate with from more code.

The code operates on grids, not on indices though. And this indirection is not something obviously gooder (see the first part of the message).

P.S. The entirety of the CCG/Subdiv might need to be revisited on a bigger scale, But i guess it is still good you digging the code to understand its limits/weaknesses before possibly revising it on a bigger API level.
P.P.S. Would it help if we move SubdivCCG to a non-trivial C++ class early on?

> I'm pretty sure the required memory is lower: > > * Before: The face pointer is loaded: 8 bytes, then de-referenced: 8 bytes > * After: The index is loaded: 4 bytes, then after a pointer offset, the face is de-referenced: 8 bytes I am confused. ``` Code before: SubdivCCGFace *face = subdiv_ccg->grid_faces[coord->grid_index]; Code after: const SubdivCCGFace *face = &subdiv_ccg->faces[subdiv_ccg->grid_faces[coord->grid_index]]; ``` So from the perspective of getting a `SubdivCCGFace *face` pointer: * Before: dereference `coord` to access `grid_index`, use that to calculate offset from the `subdiv_ccg->grid_faces`. is it 8+4+8 bytes fetch? * After: dereference `coord` to access `grid_index`, use that to calculate offset from the `subdiv_ccg->grid_faces`, and use that to calculate offset from the `subdiv_ccg->faces`. Is it 8+4+8+8 bytes fetch? Sounds like we are counting things differently, and would be nice to align in that so that we measure the same thing :) > I wanted to use an `IndexMask` to replace arrays of pointers like `effected_ccg_faces`. That is much cleaner motivational part! Goal of replacing `SubdivCCGFace` / `effected_ccg_faces` with something that is used in other places (assuming, without sacrificing performance/bandwidth) makes me more excited about digging into this change than unclear (to me) memory bandwidth optimizations in something that is probably CPU bound! > The easiest/best way to interact with that data structure is indices. It keeps the selections potentially much smaller in memory, and makes them more generic-- easier to operate with from more code. The code operates on grids, not on indices though. And this indirection is not something obviously gooder (see the first part of the message). P.S. The entirety of the CCG/Subdiv might need to be revisited on a bigger scale, But i guess it is still good you digging the code to understand its limits/weaknesses before possibly revising it on a bigger API level. P.P.S. Would it help if we move `SubdivCCG` to a non-trivial C++ class early on?
Author
Member

Sounds like we are counting things differently, and would be nice to align in that so that we measure the same thing :)

Hmm, yeah, unless I'm being very stupid, I think I'm not counting the two pointers subdiv_ccg->grid_faces and subdiv_ccg->faces in the index bandwidth calculation. Reasoning is that those pointers are the same for every grid and face, so they will already be loaded and cached.

Would it help if we move SubdivCCG to a non-trivial C++ class early on?

Yeah, good idea! I'll try to commit that separately soon.

But i guess it is still good you digging the code to understand its limits/weaknesses before possibly revising it on a bigger API level.

Yeah, I realize my work in this area is pretty random at this point. I'm trying to tackle low hanging fruit while getting to know the area better. It's a good opportunity for us to get aligned on how to talk about this stuff too (example, this PR!). I'm also working in #111675, but I'd imagine working on the topology refiner stuff a bit later, which is where I hope we can avoid some of the overhead of accessing the relatively simple mesh data.

>Sounds like we are counting things differently, and would be nice to align in that so that we measure the same thing :) Hmm, yeah, unless I'm being very stupid, I think I'm not counting the two pointers `subdiv_ccg->grid_faces` and `subdiv_ccg->faces` in the index bandwidth calculation. Reasoning is that those pointers are the same for every grid and face, so they will already be loaded and cached. >Would it help if we move SubdivCCG to a non-trivial C++ class early on? Yeah, good idea! I'll try to commit that separately soon. >But i guess it is still good you digging the code to understand its limits/weaknesses before possibly revising it on a bigger API level. Yeah, I realize my work in this area is pretty random at this point. I'm trying to tackle low hanging fruit while getting to know the area better. It's a good opportunity for us to get aligned on how to talk about this stuff too (example, this PR!). I'm also working in #111675, but I'd imagine working on the topology refiner stuff a bit later, which is where I hope we can avoid some of the overhead of accessing the relatively simple mesh data.

Reasoning is that those pointers are the same for every grid and face, so they will already be loaded and cached.

That's a rabbit hole! But I see your point.
I am not immediately sure how the total throughputs will compare. Not even sure how much time in such calculations we want to be spending.

Can see it from this point of view:

  • The memory bandwidth in this case is not known to be a bottleneck
  • The patch reduces the overall size of SubdivCCG, and memory footprint of high-poly subdivides surface is known to be problematic
  • The patch also avoids untyped void** pointers in the API

So moving forward I think it is more matter of pivoting the presentation/motivation in the PR description (which resembles commit message): don't focus too much on bandwidth but on the API improvements and overall memory footprint reduction. Those are hard to argue with :)

On the code the main concern I have is naming. To me it is important to clearly distinguish face itself and face index. Another relevant aspect is that in a hindsight I don't think name grid_faces is clear enough. Taking these two points into account the better name would be face_index_of_grid. Something like that. It is a bigger number of lines changed though, but I believe it worth it. And, also, I am open for suggestions if you can think of a better name.

> Reasoning is that those pointers are the same for every grid and face, so they will already be loaded and cached. That's a rabbit hole! But I see your point. I am not immediately sure how the total throughputs will compare. Not even sure how much time in such calculations we want to be spending. Can see it from this point of view: - The memory bandwidth in this case is not known to be a bottleneck - The patch reduces the overall size of `SubdivCCG`, and memory footprint of high-poly subdivides surface is known to be problematic - The patch also avoids untyped `void**` pointers in the API So moving forward I think it is more matter of pivoting the presentation/motivation in the PR description (which resembles commit message): don't focus too much on bandwidth but on the API improvements and overall memory footprint reduction. Those are hard to argue with :) On the code the main concern I have is naming. To me it is important to clearly distinguish face itself and face index. Another relevant aspect is that in a hindsight I don't think name `grid_faces` is clear enough. Taking these two points into account the better name would be `face_index_of_grid`. Something like that. It is a bigger number of lines changed though, but I believe it worth it. And, also, I am open for suggestions if you can think of a better name.
Hans Goudey added 3 commits 2023-08-30 17:37:57 +02:00
Sergey Sharybin approved these changes 2023-08-31 17:40:29 +02:00
Sergey Sharybin left a comment
Owner

Something we briefly covered in the chat is the grid_to_face_map vs. grid_to_face_index_map.

Personally I'd choose the latter, but there are also arguments to use the former (i.e. similar naming convention in the Mesh).

In this particular case I am not too much emotionally attached to either of them, more like a preference.

Thanks for updating the description and taking care of a bigger scope renaming!

Something we briefly covered in the chat is the `grid_to_face_map` vs. `grid_to_face_index_map`. Personally I'd choose the latter, but there are also arguments to use the former (i.e. similar naming convention in the `Mesh`). In this particular case I am not too much emotionally attached to either of them, more like a preference. Thanks for updating the description and taking care of a bigger scope renaming!
Hans Goudey merged commit 2228006d5b into main 2023-08-31 19:40:46 +02:00
Hans Goudey deleted branch cleanup-grid-faces-int 2023-08-31 19:40:52 +02:00
Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#111078
No description provided.