Mesh Struct of Arrays Refactor #95965

New Issue

Hans Goudey · 2022-02-22T19:52:26+01:00

Hans Goudey commented

2022-02-22 19:52:26 +01:00

Background

Currently, Blender uses an array of structs format to store some specific mesh data. Instead of storing each data layer in a separate array, arbitrary data (mostly flags and data that was added years ago) is grouped together in specific structs.

Switching to a struct of arrays format can provide significant performance improvements and code simplification. Besides memory usage improvements and general improvements to the efficiency of accessing data, sometimes processing data can be avoided completely. Here is a more complete list of the benefits:

In many cases, specific data like bevel weights aren't even used, so avoiding storing them can reduce memory usage.
Accessing just the data for any attribute becomes faster, since less memory has to be loaded.
Code becomes clearer by separating concerns about where things are stored. Each type of data is conceptually separated.
It's possible to use SIMD operations directly on data stored contiguously.
Sharing generic algorithms that work on basic data types is much simpler.
Interfacing with exporters, libraries, the GPU, etc. that expect a contiguous array of basic data types is much simpler.
Other areas in Blender (point clouds, hair, attributes, etc.) are moving to the SoA format, so it's consistent with elsewhere. In other words, it's the established best practice.

The Goal

Vertices

#93602

Vertex data should become float3, with all other data moving to separate places.

Edges

#95966

struct MEdge {
  int v1;
  int v2;
};

Edges store a lot of flags currently, which should be moved to runtime data or separate boolean attributes.
The final goal should be that the edge layer is just two vertex indices.

Polygons

#95967

OffsetIndices<int> polys;

Material indices should move to a separate generic integer attribute, as should the smooth flag. The size of each face is redundant with the offset stored in the next polygon, so we should use the same offsets system as curves to halve the memory usage.

Face Corners / Loops

#102359

Span<int> corner_verts;
Span<int> corner_edges;

Face corners should be split into separate arrays, to prioritize performance in hot loops when only edge or vertices are accessed and to make the data access more generic.

UVs

This can be handled separately, but I wanted to note that the flag should be removed from MLoopUV, which would become float2, just like mesh vertex positions.

Open Questions

Backward Compatibility

Generally this is easy, we just need to add versioning code that adds the new custom data layers
The hard part about backwards compatibility is using the same name for the DNA structs after the refactoring, because we don't want to keep the deprecated members in the structs.
This could be implemented as a special case in file reading code, only for MPoly and MEdge.

Forward Compatibility

See comment for current plan: #95965#1393317

Forward compatibility is already a bit harder. As these refactors land, more and more mesh data would not load correctly in earlier versions.
For runtime data like normals, a fix can be a bit simpler, like tagging normals dirty when reading from the future version (see add07576a0).
But that obviously isn't a solution for most things.

There are a few possibilities that I know of:

Accept the forward compatibility loss, if we decide it isn't worth developer's time to maintain.

I'm not sure how people would take this, but it is a pragmatic approach that's a bit tempting.

Rewrite mesh writing and reading to convert to and from the old format, for all files.

For one or two attributes, this seems feasible, but as the changes become larger, this would become quite convoluted.
The reading and writing code wouldn't benefit from any of the performance improvements.
We would lose information about whether the optional layers exist. For example, all material indices being zero would look the same as the material index layer not existing.

Add a "Legacy Mesh Format" option to file saving, and maintain it for a few versions.

This existed as an option in the 2.7 series, see https://docs.blender.org/manual/en/2.79/data_system/files/save.html
While the convoluted code would still exist, it would be grouped together, and it would be possible to eventually remove it.

Because of the reasoning above, I strongly prefer option 3.

Edit-Data Attributes

Some data like selections should not be propagated like other attributes. Additionally, their addressing needs a bit of design.

See #97452 for the latest proposal.

# Background Currently, Blender uses an array of structs format to store some specific mesh data. Instead of storing each data layer in a separate array, arbitrary data (mostly flags and data that was added years ago) is grouped together in specific structs. Switching to a struct of arrays format can provide significant performance improvements and code simplification. Besides memory usage improvements and general improvements to the efficiency of accessing data, sometimes processing data can be avoided completely. Here is a more complete list of the benefits: - In many cases, specific data like bevel weights aren't even used, so avoiding storing them can reduce memory usage. - Accessing just the data for any attribute becomes faster, since less memory has to be loaded. - Code becomes clearer by separating concerns about where things are stored. Each type of data is conceptually separated. - It's possible to use SIMD operations directly on data stored contiguously. - Sharing generic algorithms that work on basic data types is much simpler. - Interfacing with exporters, libraries, the GPU, etc. that expect a contiguous array of basic data types is much simpler. - Other areas in Blender (point clouds, hair, attributes, etc.) are moving to the SoA format, so it's consistent with elsewhere. In other words, it's the **established best practice**. # The Goal - [x] **Vertices** #93602 Vertex data should become `float3`, with all other data moving to separate places. - [x] **Edges** #95966 ``` struct MEdge { int v1; int v2; }; ``` Edges store a lot of flags currently, which should be moved to runtime data or separate boolean attributes. The final goal should be that the edge layer is just two vertex indices. - [x] **Polygons** #95967 ``` OffsetIndices<int> polys; ``` Material indices should move to a separate generic integer attribute, as should the smooth flag. The size of each face is redundant with the offset stored in the next polygon, so we should use the same offsets system as curves to halve the memory usage. - [x] **Face Corners / Loops** #102359 ``` Span<int> corner_verts; Span<int> corner_edges; ``` Face corners should be split into separate arrays, to prioritize performance in hot loops when only edge or vertices are accessed and to make the data access more generic. - [x] **UVs** This can be handled separately, but I wanted to note that the flag should be removed from `MLoopUV`, which would become `float2`, just like mesh vertex positions. # Open Questions **Backward Compatibility** Generally this is easy, we just need to add versioning code that adds the new custom data layers The hard part about backwards compatibility is using the same name for the DNA structs after the refactoring, because we don't want to keep the deprecated members in the structs. This could be implemented as a special case in file reading code, only for `MPoly` and `MEdge`. **Forward Compatibility** *See comment for current plan: #95965#1393317* Forward compatibility is already a bit harder. As these refactors land, more and more mesh data would not load correctly in earlier versions. For runtime data like normals, a fix can be a bit simpler, like tagging normals dirty when reading from the future version (see add07576a090). But that obviously isn't a solution for most things. There are a few possibilities that I know of: 1. Accept the forward compatibility loss, if we decide it isn't worth developer's time to maintain. - I'm not sure how people would take this, but it is a pragmatic approach that's a bit tempting. 2. Rewrite mesh writing and reading to convert to and from the old format, for all files. - For one or two attributes, this seems feasible, but as the changes become larger, this would become quite convoluted. - The reading and writing code wouldn't benefit from any of the performance improvements. - We would lose information about whether the optional layers exist. For example, all material indices being zero would look the same as the material index layer not existing. 3. Add a "Legacy Mesh Format" option to file saving, and maintain it for a few versions. - This existed as an option in the 2.7 series, see https://docs.blender.org/manual/en/2.79/data_system/files/save.html - While the convoluted code would still exist, it would be grouped together, and it would be possible to eventually remove it. Because of the reasoning above, I strongly prefer option 3. **Edit-Data Attributes** Some data like selections should not be propagated like other attributes. Additionally, their addressing needs a bit of design. See #97452 for the latest proposal.

🎉 1

Campbell Barton commented

2022-06-02 08:08:23 +02:00

Wouldn't it be worth storing selection & hidden state in a single flag array? As both as used a lot for tool & drawing code, I don't see much advantage in storing them separately.

There could be an advantage if the arrays were created & removed as needed, so de-selecting all could free the selection array, revealing all could free the hidden array, although this has pros/cons (having to always account for NULL layer anywhere selection checks are needed complicates access).

If both are always allocated though, storing them separately doesn't seem necessary.

Wouldn't it be worth storing selection & hidden state in a single flag array? As both as used a lot for tool & drawing code, I don't see much advantage in storing them separately. There could be an advantage if the arrays were created & removed as needed, so de-selecting all could free the selection array, revealing all could free the hidden array, although this has pros/cons (having to always account for NULL layer anywhere selection checks are needed complicates access). If both are always allocated though, storing them separately doesn't seem necessary.

Hans Goudey commented

2022-06-02 10:31:16 +02:00

I don't think combining multiple flags into a single attribute would be the right choice, for a few reasons:

Storing multiple attributes in one layer would negate the benefits for code simplicity
- When storing each attribute as a separate layer, operations can be done on the layer as a generic data type.
- That means any algorithm written for an array of booleans (or whatever other storage we use) can be reused.
- It's hard to overstate how much this can simplify existing code. It means things like selecting faces based on a point selection is as simple as doing a generic domain interpolation.
Storing multiple flags in one layer would complicate potential future performance improvements
- There are other data structures we could use to store boolean attributes-- vectors of indices, bitmaps, trees, etc.
- Those data structures can easily represent a single boolean attribute, but not multiple
Having an attribute correspond to a single conceptual idea means we can assign meaning to the case when the attribute doesn't exist
- When there are no .hide layers, nothing is hidden-- all processing related to hide status can be skipped (this means that with well designed code you don't need to worry about NULL layers really).
- When there are no .selection layers, everything is selected (or nothing, depending on what we decide), this can also simplify code.

the arrays were created & removed as needed

Yes, this is exactly what I want to do, I think it's where the potential memory usage improvements come in, for common cases when these layers aren't used.

Besides that, I think there is a misconception that because multiple data sources are used in the same algorithm they have to be stored together.
I see it as a benefit to the algorithm if it can have generic inputs that come from different sources instead.

I don't think combining multiple flags into a single attribute would be the right choice, for a few reasons: 1. Storing multiple attributes in one layer would negate the benefits for code simplicity - When storing each attribute as a separate layer, operations can be done on the layer as a generic data type. - That means any algorithm written for an array of booleans (or whatever other storage we use) can be reused. - It's hard to overstate how much this can simplify existing code. It means things like selecting faces based on a point selection is as simple as doing a generic domain interpolation. 2. Storing multiple flags in one layer would complicate potential future performance improvements - There are other data structures we could use to store boolean attributes-- vectors of indices, bitmaps, trees, etc. - Those data structures can easily represent a single boolean attribute, but not multiple 3. Having an attribute correspond to a single conceptual idea means we can assign meaning to the case when the attribute doesn't exist - When there are no `.hide` layers, nothing is hidden-- all processing related to hide status can be skipped (this means that with well designed code you don't need to worry about NULL layers really). - When there are no `.selection` layers, everything is selected (or nothing, depending on what we decide), this can also simplify code. --- >the arrays were created & removed as needed Yes, this is exactly what I want to do, I think it's where the potential memory usage improvements come in, for common cases when these layers aren't used. --- Besides that, I think there is a misconception that because multiple data sources are used in the same algorithm they have to be stored together. I see it as a benefit to the algorithm if it can have generic inputs that come from different sources instead.

Martijn Versteegh commented

2022-06-16 12:59:13 +02:00

To add my 2c: In the case of uvmaps (storing 3 bools) storing them separately is cheaper memorywise than it used to be. Used to a a 32 bit int, vs now 3 bools. The bools are currently stored as chars. But we could optimize that if needed.

So in any case memory is not an argument to stuff multiple flags in a single variable in this case, imo.

To add my 2c: In the case of uvmaps (storing 3 bools) storing them separately is cheaper memorywise than it used to be. Used to a a 32 bit int, vs now 3 bools. The bools are currently stored as chars. But we could optimize that if needed. So in any case memory is not an argument to stuff multiple flags in a single variable in this case, imo.

Hans Goudey commented

2022-07-20 17:41:16 +02:00

Here is an update on the forward compatibility topic based on discussion in D14583:

We don't want to take breaking forward compatibility lightly, but we also need to be realistic and not try to maintain forward compatibility forever, in order to benefit from the simplification and performance improvements.
We will write with the legacy mesh format (unchanged from how it is now) until 4.0.
As part of 4.0, we will switch to writing with the new mesh format (generic attribute layers, etc).
This gives us time to communicate properly and finish the changes mentioned here, so all the compatibility breakage happens at once.

Here is an update on the forward compatibility topic based on discussion in [D14583](https://archive.blender.org/developer/D14583): - We don't want to take breaking forward compatibility lightly, but we also need to be realistic and not try to maintain forward compatibility forever, in order to benefit from the simplification and performance improvements. - We will write with the legacy mesh format (unchanged from how it is now) until 4.0. - As part of 4.0, we will switch to writing with the new mesh format (generic attribute layers, etc). - This gives us time to communicate properly and finish the changes mentioned here, so all the compatibility breakage happens at once.

blender-admin commented

2022-08-11 18:54:24 +02:00

This issue was referenced by 2480b55f21

This issue was referenced by 2480b55f216c31373a84bc5c5d2b0cc158497c44

blender-admin commented

2022-09-23 16:38:37 +02:00

This issue was referenced by 12becbf0df

This issue was referenced by 12becbf0dffe06b6f28c4cc444fe0312cf9249b9

blender-admin commented

2023-01-10 06:47:04 +01:00

This issue was referenced by 6c774feba2

This issue was referenced by 6c774feba2c9a1eb5834646f597a0f2c63177914

Hans Goudey referenced this issue

2023-02-07 19:36:24 +01:00

Fix #100957: Dyntopo shows false positive data loss warnings #104423

Hans Goudey referenced this issue

2023-02-09 19:00:17 +01:00

Fix #100957: Dyntopo shows false positive data loss warnings #104535

Hans Goudey added this to the Nodes & Physics project 2023-02-27 15:47:05 +01:00

Hans Goudey added this to the 4.0 milestone 2023-02-27 15:48:07 +01:00

Hans Goudey referenced this issue

2023-03-07 15:22:39 +01:00

Breaking Mesh API changes for 4.0 #100153

Hans Goudey referenced this issue

2023-04-06 20:55:27 +02:00

Struct of Arrays Refactor for Mesh Edges #95966

Hans Goudey referenced this issue

2023-04-06 21:16:43 +02:00

Mesh: Move edges to a generic attribute #106638

Hans Goudey referenced this issue from a commit

2023-04-17 13:47:53 +02:00

Mesh: Move edges to a generic attribute

Hans Goudey commented

2023-04-17 13:50:37 +02:00

I have one more PR that's slightly related for the cached triangulation (#106774), but that doesn't affect the file format so it can be done any time. Otherwise, this task is done!

🎉 3

Hans Goudey closed this issue

2023-04-17 13:50:40 +02:00

Blender Bot added

and removed

labels 2023-04-17 13:50:41 +02:00

Sign in to join this conversation.