Mesh Struct of Arrays Refactor #95965

Closed
opened 2022-02-22 19:52:26 +01:00 by Hans Goudey · 8 comments
Member

Background

Currently, Blender uses an array of structs format to store some specific mesh data. Instead of storing each data layer in a separate array, arbitrary data (mostly flags and data that was added years ago) is grouped together in specific structs.

Switching to a struct of arrays format can provide significant performance improvements and code simplification. Besides memory usage improvements and general improvements to the efficiency of accessing data, sometimes processing data can be avoided completely. Here is a more complete list of the benefits:

  • In many cases, specific data like bevel weights aren't even used, so avoiding storing them can reduce memory usage.
  • Accessing just the data for any attribute becomes faster, since less memory has to be loaded.
  • Code becomes clearer by separating concerns about where things are stored. Each type of data is conceptually separated.
  • It's possible to use SIMD operations directly on data stored contiguously.
  • Sharing generic algorithms that work on basic data types is much simpler.
  • Interfacing with exporters, libraries, the GPU, etc. that expect a contiguous array of basic data types is much simpler.
  • Other areas in Blender (point clouds, hair, attributes, etc.) are moving to the SoA format, so it's consistent with elsewhere. In other words, it's the established best practice.

The Goal

  • Vertices

#93602

Vertex data should become float3, with all other data moving to separate places.

  • Edges

#95966

struct MEdge {
  int v1;
  int v2;
};

Edges store a lot of flags currently, which should be moved to runtime data or separate boolean attributes.
The final goal should be that the edge layer is just two vertex indices.

  • Polygons

#95967

OffsetIndices<int> polys;

Material indices should move to a separate generic integer attribute, as should the smooth flag. The size of each face is redundant with the offset stored in the next polygon, so we should use the same offsets system as curves to halve the memory usage.

  • Face Corners / Loops

#102359

Span<int> corner_verts;
Span<int> corner_edges;

Face corners should be split into separate arrays, to prioritize performance in hot loops when only edge or vertices are accessed and to make the data access more generic.

  • UVs

This can be handled separately, but I wanted to note that the flag should be removed from MLoopUV, which would become float2, just like mesh vertex positions.

Open Questions

Backward Compatibility

Generally this is easy, we just need to add versioning code that adds the new custom data layers
The hard part about backwards compatibility is using the same name for the DNA structs after the refactoring, because we don't want to keep the deprecated members in the structs.
This could be implemented as a special case in file reading code, only for MPoly and MEdge.

Forward Compatibility

See comment for current plan: #95965#1393317

Forward compatibility is already a bit harder. As these refactors land, more and more mesh data would not load correctly in earlier versions.
For runtime data like normals, a fix can be a bit simpler, like tagging normals dirty when reading from the future version (see add07576a0).
But that obviously isn't a solution for most things.

There are a few possibilities that I know of:

  1. Accept the forward compatibility loss, if we decide it isn't worth developer's time to maintain.
  • I'm not sure how people would take this, but it is a pragmatic approach that's a bit tempting.
  1. Rewrite mesh writing and reading to convert to and from the old format, for all files.
  • For one or two attributes, this seems feasible, but as the changes become larger, this would become quite convoluted.
  • The reading and writing code wouldn't benefit from any of the performance improvements.
  • We would lose information about whether the optional layers exist. For example, all material indices being zero would look the same as the material index layer not existing.
  1. Add a "Legacy Mesh Format" option to file saving, and maintain it for a few versions.

Because of the reasoning above, I strongly prefer option 3.

Edit-Data Attributes

Some data like selections should not be propagated like other attributes. Additionally, their addressing needs a bit of design.

See #97452 for the latest proposal.

# Background Currently, Blender uses an array of structs format to store some specific mesh data. Instead of storing each data layer in a separate array, arbitrary data (mostly flags and data that was added years ago) is grouped together in specific structs. Switching to a struct of arrays format can provide significant performance improvements and code simplification. Besides memory usage improvements and general improvements to the efficiency of accessing data, sometimes processing data can be avoided completely. Here is a more complete list of the benefits: - In many cases, specific data like bevel weights aren't even used, so avoiding storing them can reduce memory usage. - Accessing just the data for any attribute becomes faster, since less memory has to be loaded. - Code becomes clearer by separating concerns about where things are stored. Each type of data is conceptually separated. - It's possible to use SIMD operations directly on data stored contiguously. - Sharing generic algorithms that work on basic data types is much simpler. - Interfacing with exporters, libraries, the GPU, etc. that expect a contiguous array of basic data types is much simpler. - Other areas in Blender (point clouds, hair, attributes, etc.) are moving to the SoA format, so it's consistent with elsewhere. In other words, it's the **established best practice**. # The Goal - [x] **Vertices** #93602 Vertex data should become `float3`, with all other data moving to separate places. - [x] **Edges** #95966 ``` struct MEdge { int v1; int v2; }; ``` Edges store a lot of flags currently, which should be moved to runtime data or separate boolean attributes. The final goal should be that the edge layer is just two vertex indices. - [x] **Polygons** #95967 ``` OffsetIndices<int> polys; ``` Material indices should move to a separate generic integer attribute, as should the smooth flag. The size of each face is redundant with the offset stored in the next polygon, so we should use the same offsets system as curves to halve the memory usage. - [x] **Face Corners / Loops** #102359 ``` Span<int> corner_verts; Span<int> corner_edges; ``` Face corners should be split into separate arrays, to prioritize performance in hot loops when only edge or vertices are accessed and to make the data access more generic. - [x] **UVs** This can be handled separately, but I wanted to note that the flag should be removed from `MLoopUV`, which would become `float2`, just like mesh vertex positions. # Open Questions **Backward Compatibility** Generally this is easy, we just need to add versioning code that adds the new custom data layers The hard part about backwards compatibility is using the same name for the DNA structs after the refactoring, because we don't want to keep the deprecated members in the structs. This could be implemented as a special case in file reading code, only for `MPoly` and `MEdge`. **Forward Compatibility** *See comment for current plan: #95965#1393317* Forward compatibility is already a bit harder. As these refactors land, more and more mesh data would not load correctly in earlier versions. For runtime data like normals, a fix can be a bit simpler, like tagging normals dirty when reading from the future version (see add07576a090). But that obviously isn't a solution for most things. There are a few possibilities that I know of: 1. Accept the forward compatibility loss, if we decide it isn't worth developer's time to maintain. - I'm not sure how people would take this, but it is a pragmatic approach that's a bit tempting. 2. Rewrite mesh writing and reading to convert to and from the old format, for all files. - For one or two attributes, this seems feasible, but as the changes become larger, this would become quite convoluted. - The reading and writing code wouldn't benefit from any of the performance improvements. - We would lose information about whether the optional layers exist. For example, all material indices being zero would look the same as the material index layer not existing. 3. Add a "Legacy Mesh Format" option to file saving, and maintain it for a few versions. - This existed as an option in the 2.7 series, see https://docs.blender.org/manual/en/2.79/data_system/files/save.html - While the convoluted code would still exist, it would be grouped together, and it would be possible to eventually remove it. Because of the reasoning above, I strongly prefer option 3. **Edit-Data Attributes** Some data like selections should not be propagated like other attributes. Additionally, their addressing needs a bit of design. See #97452 for the latest proposal.

Wouldn't it be worth storing selection & hidden state in a single flag array? As both as used a lot for tool & drawing code, I don't see much advantage in storing them separately.

There could be an advantage if the arrays were created & removed as needed, so de-selecting all could free the selection array, revealing all could free the hidden array, although this has pros/cons (having to always account for NULL layer anywhere selection checks are needed complicates access).

If both are always allocated though, storing them separately doesn't seem necessary.

Wouldn't it be worth storing selection & hidden state in a single flag array? As both as used a lot for tool & drawing code, I don't see much advantage in storing them separately. There could be an advantage if the arrays were created & removed as needed, so de-selecting all could free the selection array, revealing all could free the hidden array, although this has pros/cons (having to always account for NULL layer anywhere selection checks are needed complicates access). If both are always allocated though, storing them separately doesn't seem necessary.
Author
Member

I don't think combining multiple flags into a single attribute would be the right choice, for a few reasons:

  1. Storing multiple attributes in one layer would negate the benefits for code simplicity
    • When storing each attribute as a separate layer, operations can be done on the layer as a generic data type.
    • That means any algorithm written for an array of booleans (or whatever other storage we use) can be reused.
    • It's hard to overstate how much this can simplify existing code. It means things like selecting faces based on a point selection is as simple as doing a generic domain interpolation.
  2. Storing multiple flags in one layer would complicate potential future performance improvements
    • There are other data structures we could use to store boolean attributes-- vectors of indices, bitmaps, trees, etc.
    • Those data structures can easily represent a single boolean attribute, but not multiple
  3. Having an attribute correspond to a single conceptual idea means we can assign meaning to the case when the attribute doesn't exist
    • When there are no .hide layers, nothing is hidden-- all processing related to hide status can be skipped (this means that with well designed code you don't need to worry about NULL layers really).
    • When there are no .selection layers, everything is selected (or nothing, depending on what we decide), this can also simplify code.

the arrays were created & removed as needed

Yes, this is exactly what I want to do, I think it's where the potential memory usage improvements come in, for common cases when these layers aren't used.


Besides that, I think there is a misconception that because multiple data sources are used in the same algorithm they have to be stored together.
I see it as a benefit to the algorithm if it can have generic inputs that come from different sources instead.

I don't think combining multiple flags into a single attribute would be the right choice, for a few reasons: 1. Storing multiple attributes in one layer would negate the benefits for code simplicity - When storing each attribute as a separate layer, operations can be done on the layer as a generic data type. - That means any algorithm written for an array of booleans (or whatever other storage we use) can be reused. - It's hard to overstate how much this can simplify existing code. It means things like selecting faces based on a point selection is as simple as doing a generic domain interpolation. 2. Storing multiple flags in one layer would complicate potential future performance improvements - There are other data structures we could use to store boolean attributes-- vectors of indices, bitmaps, trees, etc. - Those data structures can easily represent a single boolean attribute, but not multiple 3. Having an attribute correspond to a single conceptual idea means we can assign meaning to the case when the attribute doesn't exist - When there are no `.hide` layers, nothing is hidden-- all processing related to hide status can be skipped (this means that with well designed code you don't need to worry about NULL layers really). - When there are no `.selection` layers, everything is selected (or nothing, depending on what we decide), this can also simplify code. --- >the arrays were created & removed as needed Yes, this is exactly what I want to do, I think it's where the potential memory usage improvements come in, for common cases when these layers aren't used. --- Besides that, I think there is a misconception that because multiple data sources are used in the same algorithm they have to be stored together. I see it as a benefit to the algorithm if it can have generic inputs that come from different sources instead.

To add my 2c: In the case of uvmaps (storing 3 bools) storing them separately is cheaper memorywise than it used to be. Used to a a 32 bit int, vs now 3 bools. The bools are currently stored as chars. But we could optimize that if needed.

So in any case memory is not an argument to stuff multiple flags in a single variable in this case, imo.

To add my 2c: In the case of uvmaps (storing 3 bools) storing them separately is cheaper memorywise than it used to be. Used to a a 32 bit int, vs now 3 bools. The bools are currently stored as chars. But we could optimize that if needed. So in any case memory is not an argument to stuff multiple flags in a single variable in this case, imo.
Author
Member

Here is an update on the forward compatibility topic based on discussion in D14583:

  • We don't want to take breaking forward compatibility lightly, but we also need to be realistic and not try to maintain forward compatibility forever, in order to benefit from the simplification and performance improvements.
  • We will write with the legacy mesh format (unchanged from how it is now) until 4.0.
  • As part of 4.0, we will switch to writing with the new mesh format (generic attribute layers, etc).
  • This gives us time to communicate properly and finish the changes mentioned here, so all the compatibility breakage happens at once.
Here is an update on the forward compatibility topic based on discussion in [D14583](https://archive.blender.org/developer/D14583): - We don't want to take breaking forward compatibility lightly, but we also need to be realistic and not try to maintain forward compatibility forever, in order to benefit from the simplification and performance improvements. - We will write with the legacy mesh format (unchanged from how it is now) until 4.0. - As part of 4.0, we will switch to writing with the new mesh format (generic attribute layers, etc). - This gives us time to communicate properly and finish the changes mentioned here, so all the compatibility breakage happens at once.

This issue was referenced by 2480b55f21

This issue was referenced by 2480b55f216c31373a84bc5c5d2b0cc158497c44

This issue was referenced by 12becbf0df

This issue was referenced by 12becbf0dffe06b6f28c4cc444fe0312cf9249b9

This issue was referenced by 6c774feba2

This issue was referenced by 6c774feba2c9a1eb5834646f597a0f2c63177914
Hans Goudey added this to the Module: Nodes & Physics project 2023-02-27 15:47:05 +01:00
Hans Goudey added this to the 4.0 milestone 2023-02-27 15:48:07 +01:00
Author
Member

I have one more PR that's slightly related for the cached triangulation (#106774), but that doesn't affect the file format so it can be done any time. Otherwise, this task is done!

I have one more PR that's slightly related for the cached triangulation (#106774), but that doesn't affect the file format so it can be done any time. Otherwise, this task is done!
Blender Bot added
Status
Archived
and removed
Status
Confirmed
labels 2023-04-17 13:50:41 +02:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Code Documentation
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
FBX
Interest
Freestyle
Interest
Geometry Nodes
Interest
glTF
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Asset System
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#95965
No description provided.