WIP: Core: use generic copy-on-write system to avoid redundant copies #104478

Closed
Jacques Lucke wants to merge 66 commits from JacquesLucke/blender:temp-copy-on-write into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.
Member

This patch adds copy-on-write support in a few places. The main goal is to avoid copying data unnecessarily, improving memory usage and performance. For more details, see #95845. This patch is only the culmination of a series of patches mostly @HooglyBoogly worked on. We had to make accessing data in CustomData const-correct by introducing the _for_write functions when accessing layers. Furthermore, the embedded pointers to some special layers like Mesh->mvert had to be deprecated.

This patch includes:

  • CustomData refactor that allows sharing attribute layers. To properly implement this, the API had to be changed a bit as well.
  • Integration with the undo system, which fixes #98574.
  • GeometryComponent uses the same copy-on-write system, instead of what it used before.
  • AnonymousAttributeID also uses the system, but doesn't really have to (because the data is immutable anyway). It was just easier to use the same system for now.

Possible future uses:

  • Take ownership of attribute arrays in cycles instead of copying them. This might required changes to the Attribute class in cycles.
  • Avoid copying arrays to create a GPUVertBuf.

Notes:

  • This removes BLO_memfile_write_file because the entire file is not serialized for an undo step anymore. Serializing the rest on demand is not really trivial (e.g. for vertex groups). I think it was generally a nice thing that an undo step could be written to disk, but it does not feel absolutely necessary if it gets in the way of better run-time performance. The consequence is that quit.blend is now written using normal file-save and that USE_WRITE_CRASH_BLEND has been removed (never knew it existed before, was someone using that?).
  • The changes I did in editmesh_undo.cc work right now, but are a bit brittle because they make assumptions about where the data comes from. It should be possible to improve the situation there a bit when layers are not partially freed anymore (currently MEM_freeN(layer->data); only frees the pointer array of e.g. vertex groups).

Performance:

  • Compared to before, performance should generally be the same or better, depending on what the bottleneck is. I didn't measure a performance regression in the production files I tested.
  • It is relatively straight forward to build an example file that shows an arbitrarily large speedup, so putting numbers to it are somewhat meaningless. For reference I attached one .blend file that shows that copy-on-write is behaving as expected. The Object Info node in geometry nodes retrieves the geometry from another object and modifies only the position. All other attributes are shared. The modifier evaluation on object B takes about 30ms in main and 3ms with this patch applied.
This patch adds copy-on-write support in a few places. The main goal is to avoid copying data unnecessarily, improving memory usage and performance. For more details, see #95845. This patch is only the culmination of a series of patches mostly @HooglyBoogly worked on. We had to make accessing data in `CustomData` const-correct by introducing the `_for_write` functions when accessing layers. Furthermore, the embedded pointers to some special layers like `Mesh->mvert` had to be deprecated. This patch includes: * `CustomData` refactor that allows sharing attribute layers. To properly implement this, the API had to be changed a bit as well. * Integration with the undo system, which fixes #98574. * `GeometryComponent` uses the same copy-on-write system, instead of what it used before. * `AnonymousAttributeID` also uses the system, but doesn't really have to (because the data is immutable anyway). It was just easier to use the same system for now. Possible future uses: * Take ownership of attribute arrays in cycles instead of copying them. This might required changes to the `Attribute` class in cycles. * Avoid copying arrays to create a `GPUVertBuf`. Notes: * This removes `BLO_memfile_write_file` because the entire file is not serialized for an undo step anymore. Serializing the rest on demand is not really trivial (e.g. for vertex groups). I think it was generally a nice thing that an undo step could be written to disk, but it does not feel absolutely necessary if it gets in the way of better run-time performance. The consequence is that `quit.blend` is now written using normal file-save and that `USE_WRITE_CRASH_BLEND` has been removed (never knew it existed before, was someone using that?). * The changes I did in `editmesh_undo.cc` work right now, but are a bit brittle because they make assumptions about where the data comes from. It should be possible to improve the situation there a bit when layers are not partially freed anymore (currently `MEM_freeN(layer->data);` only frees the pointer array of e.g. vertex groups). Performance: * Compared to before, performance should generally be the same or better, depending on what the bottleneck is. I didn't measure a performance regression in the production files I tested. * It is relatively straight forward to build an example file that shows an arbitrarily large speedup, so putting numbers to it are somewhat meaningless. For reference I attached one .blend file that shows that copy-on-write is behaving as expected. The `Object Info` node in geometry nodes retrieves the geometry from another object and modifies only the position. All other attributes are shared. The modifier evaluation on object `B` takes about 30ms in `main` and 3ms with this patch applied.
Jacques Lucke added 50 commits 2023-02-08 18:50:37 +01:00
Jacques Lucke added 41 commits 2023-02-10 17:01:33 +01:00
buildbot/vdev-code-daily-coordinator Build done. Details
a1282ab015
Fix Cycles debug build error after host falback changes
Introduced in dcfb6df9ce6.

Co-authored-by: Lucas Tadeu Teixeira <lucas@lucastadeu.com>

Pull Request #104454
buildbot/vdev-code-daily-coordinator Build done. Details
0ab3ac7a41
BLI: Math: Fix vector operator * with `MutableMatView`
This was caused by operator priority trying to use
`friend VecBase operator*(const VecBase &a, FactorT b)`.

Adding tests as these were not covered.
buildbot/vdev-code-daily-coordinator Build done. Details
a0f5240089
EEVEE-Next: Virtual Shadow Map initial implementation
Implements virtual shadow mapping for EEVEE-Next primary shadow solution.
This technique aims to deliver really high precision shadowing for many
lights while keeping a relatively low cost.

The technique works by splitting each shadows in tiles that are only
allocated & updated on demand by visible surfaces and volumes.
Local lights use cubemap projection with mipmap level of detail to adapt
the resolution to the receiver distance.
Sun lights use clipmap distribution or cascade distribution (depending on
which is better) for selecting the level of detail with the distance to
the camera.

Current maximum shadow precision for local light is about 1 pixel per 0.01
degrees.
For sun light, the maximum resolution is based on the camera far clip
distance which sets the most coarse clipmap.

## Limitation:
Alpha Blended surfaces might not get correct shadowing in some corner
casses. This is to be fixed in another commit.
While resolution is greatly increase, it is still finite. It is virtually
equivalent to one 8K shadow per shadow cube face and per clipmap level.
There is no filtering present for now.

## Parameters:
Shadow Pool Size: In bytes, amount of GPU memory to dedicate to the
shadow pool (is allocated per viewport).
Shadow Scaling: Scale the shadow resolution. Base resolution should
target subpixel accuracy (within the limitation of the technique).

Related to #93220
Related to #104472
buildbot/vdev-code-daily-coordinator Build done. Details
9c03a1c92f
Fix Cycles link error with debug/asan builds after recent bugfix
Pull Request #104487
buildbot/vdev-code-daily-coordinator Build done. Details
9103978952
EEVEE-Next: Shadow: Fix issue with last merge
The merge with master updated the code to use the new matrix API. This
introduce some regressions.

For sunlights make sure there is enough tilemaps in orthographic mode
to cover the depth range and fix the level offset in perspective.
buildbot/vdev-code-daily-coordinator Build done. Details
94d280fc3f
EEVEE-Next: Shadows: Add global switch
This allow to bypass all cost associated with shadow mapping.

This can be useful in certain situation, such as opening a scene on a
lower end system or just to gain performance in some situation (lookdev).
9fd71d470e PyAPI: minor change to rna_manual_reference loading
- Use bpy.utils.execfile instead of importing then deleting from
  sys.modules.
- Add a note for why keeping this cached in memory isn't necessary.

This has the advantage of not interfering with any scripts that import
`rna_manual_reference` as a module.
buildbot/vdev-code-daily-coordinator Build done. Details
0381fe7bfe
Cleanup: update username in code-comments: campbellbarton -> ideasman42
Gitea migration changed my username, update code-comments.
buildbot/vdev-code-daily-coordinator Build done. Details
3c8f7b1a64
Cleanup: Remove unused/redundant includes from BKE_curves.hh
Avoid including headers that are obviously redundant, and don't
include BLI_task.hh in the header file, since it isn't really related.
buildbot/vdev-code-daily-coordinator Build done. Details
f3d7de709f
Cycles: update Intel Graphics Compiler to 1.0.13064.7 on Linux
Linux side of 8afcecdf1f.

Reviewed by: LazyDodo, sergey, campbellbarton
Ref !104458, 16984
buildbot/vdev-code-daily-coordinator Build done. Details
7effc6ffc4
Cleanup: solve compiler warnings.
Classes were predefined as structs.
buildbot/vdev-code-daily-coordinator Build done. Details
8b35db914e
GPU: Fix assert when using light gizmo.
Blender was reporting that the GPU_TEXTURE_USAGE_HOST_READ wasn't set.
This is used to indicate that the textures needs to be read back to
CPU. Textures that don't need to be read back can be optimized by the
GPU backend.

Found during investigation of #104282.
buildbot/vdev-code-daily-coordinator Build done. Details
f222fe6a3a
Build: enable Python optimizations (PGO & LTO) on Linux
This is used for most Python release builds and has been reported to
give a modest 5-10% speedup (depending on the workload).

This could be enabled on macOS too but needs to be tested.
buildbot/vdev-code-daily-coordinator Build done. Details
0e196bab76
Build: disable LTO for Python builds
LTO compiled libpython3.10.a failed to link with GCC 12.0,
disable since these libraries are intended for developers to link
against.
buildbot/vdev-code-daily-coordinator Build done. Details
ca183993a5
Fix freeing uninitialized pointer in GHOST/Wayland + X11 fallback
Freeing the timer manager didn't account for Wayland being partially
initialized.
buildbot/vdev-code-daily-coordinator Build done. Details
666c2ea012
Refactor: remove yscale from bAnimContext
`bAnimContext` had a float property called `yscale_fac` that was used to define the height of the keyframe channels.

However the property was never set, only read so there really is no need to have it in the struct.

Moreover it complicated getting the channel height because `bAnimContext` had to be passed in.

Speaking of getting the channel height. This was done with macros. I ripped them all out and replaced them with function calls.

Originally it was introduced in this patch: https://developer.blender.org/rB095c8dbe6919857ea322b213a1e240161cd7c843

Co-authored-by: Christoph Lendenfeld <chris.lenden@gmail.com>
Pull Request #104500
22edf04458 I18n: use format strings for Cycles version error messages
The required version numbers for various devices was hardcoded in the
UI messages. The result was that every time one of these versions was
bumped, every language team had to update the message in question.

Instead, the version numbers can be extracted, and injected into the
error messages using string formatting so that translation updates
need happen less frequently.

Pull Request #104488
3bed78ff59 Curves: Add box selection
This adds a `select_box` function for the `Curves` object. It is used in the `view3d_box_select` operator.

It also adds the basic selection tools in the toolbar of Edit Mode.

Authored-by: Falk David <falkdavid@gmx.de>
Pull Request #104411
7ca651d182 Mesh: Remove unnecessary edge draw flag
As described in #95966, replace the `ME_EDGEDRAW` flag with a bit
vector in mesh runtime data. Currently the the flag is only ever set
to false for the "optimal display" feature of the subdivision surface
modifier. When creating an "original" mesh in the main data-base,
the flag is always supposed to be true.

The bit vector is now created by the modifier only as necessary, and
is cleared for topology-changing operations. This fixes incorrect
interpolation of the flag as noted in #104376. Generally it isn't
possible to interpolate it through topology-changing operations.

After this, only the seam status needs to be removed from edges before
we can replace them with the generic `int2` type (or something similar)
and reduce memory usage by 1/3.

Related:
- 10131a6f62
- 145839aa42

In the future `BM_ELEM_DRAW` could be removed as well. Currently it is
used and aliased by other defines in some non-obvious ways though.

Pull Request #104417
b8e15a4a84 Fix: Add missing "-" in logic to get the channel height
This was missed when doing the refactoring in #104500
It didn't seem to have any effect until I worked on clamping the view
buildbot/vdev-code-daily-coordinator Build done. Details
bfa7f9db0e
Assets: Implement viewport drag and drop for geometry nodes
Currently there's no way to assign a geometry node group from the asset
browser to an object as a modifier without first appending/linking it
manually. This patch adds a drag and drop operator that adds a new
modifier and assigns the dragged tree.

Pull Request #104430
buildbot/vdev-code-daily-coordinator Build done. Details
50dfd5f501
Geometry Nodes: Edges to Face Groups Node
Add a new node that groups faces inside of boundary edge regions.
This is the opposite action as the existing "Face Group Boundaries"
node. It's also the same as some of the "Initialize Face Sets"
options in sculpt mode.

Discussion in #102962 has favored "Group" for a name for these
sockets rather than "Set", so that is used here.

Pull Request #104428
buildbot/vdev-code-daily-coordinator Build done. Details
1649921791
Fix: Sequencer "Pan" label using incorrect keyword 'heading_ctxt'
Oversight in db87e2a638

Reviewed By: ISS
Differential Revision: https://archive.blender.org/developer/D17213
buildbot/vdev-code-daily-coordinator Build done. Details
50918d44fb
Cleanup: Fix const correctness warning in recent commit
bc0d3c91b1 Fix #104435: Fix rna_NlaStrip_new add strip logic to be correct boolean expression
Fixed #104435: Use correct conditional logic when testing if a new NLA strip can be added in the rna_NlaStrip_new method
buildbot/vdev-code-daily-coordinator Build done. Details
2cfc4d7644
Fix #104383: don't update declaration for clipboard copy
When nodes are copied to the clipboard, they don't need their declaration.
For nodes with dynamic declaration that might depend on the node tree itself,
the declaration could not be build anyway, because the node-clipboard does
not have a node tree.

Pull Request #104432
buildbot/vdev-code-daily-coordinator Build done. Details
5c8edbd99b
Cleanup: Move 6 sculpt-session-related files and header to C++
To allow further mesh data structure refactoring. See #103343

Pull Request #104540
buildbot/vdev-code-daily-coordinator Build done. Details
7e0e07657c
GPU: Cleanup GPU_batch.h documentation and some of the API for consistency
Documented all functions, adding use case and side effects.

Also replace the use of shortened argument name by more meaningful ones.

Renamed `GPU_batch_instbuf_add_ex` and `GPU_batch_vertbuf_add_ex` to remove
the `ex` suffix as they are the main version used (removed the few usage
of the other version).

Renamed `GPU_batch_draw_instanced` to `GPU_batch_draw_instance_range` and
make it consistent with `GPU_batch_draw_range`.
2ee9c12a23 PyAPI: add bpy.utils.manual_locale_code()
Move the function for getting the language code associated with the
user manual into a utility function (from the generated
rna_manual_reference.py).

This allows other parts of Blender to create a manual URL based on the
current locale preferences and environment.

Ref !104494
8ac3096e24 Fix add-on & manual link in Help menu ignoring the current language
Use bpy.utils.manual_language_code() create manual URL's instead of
assuming English.
48d9363fa7 Cleanup: quiet clang compiler warnings
- undeclared variable warning.
- unreachable-code-return warnings.
- array-parameter, mismatch bound.
- 'requires' is a keyword in C++20, (rename to requires_flag).
buildbot/vdev-code-daily-coordinator Build done. Details
4cbe0bff34
Cleanup: spelling in comments
a8d951abdd Docs: remove malformed patterns for RNA mapping
The generator now skips these with a warning, they will need to be
corrected in the user manual.

This caused tests/python/bl_rna_manual_reference.py to fail looking
up URL's.
buildbot/vdev-code-daily-coordinator Build done. Details
c2c62c3618
RNA: return a dummy language value when WITH_INTERNATIONAL=OFF
Without this, every access to "language" would warn that the enum
value didn't match a value in the enum items.

This made the bl_rna_manual_reference.py test output practically
unusable.
buildbot/vdev-code-daily-coordinator Build done. Details
b77c82e2bb
Tests: minor updates to make bl_rna_manual_reference more useful
- Avoid flooding the output with every match that succeeds.
- Report patterns listed in the manual that don't match anything in
  Blender.
- Disable external URL lookups, this is too slow.
  Instead use a LOCAL_PREFIX (a local build of the manual)
  or skip the test.
buildbot/vdev-code-daily-coordinator Build done. Details
01480229b1
Cycles: Fix MetalRT checkbox not hooked up to device on AMD
(Follow on from D17043)
On AMD Navi2 devices the MetalRT checkbox was not hooked up properly and had no effect. This patch fixes it.

Co-authored-by: Michael Jones <michael_p_jones@apple.com>
Pull Request #104520
Jacques Lucke changed title from WIP: use generic copy-on-write system to avoid redundant copies to WIP: Core: use generic copy-on-write system to avoid redundant copies 2023-03-08 14:12:51 +01:00
Author
Member

@blender-bot build

@blender-bot build
Jacques Lucke force-pushed temp-copy-on-write from c358271b03 to 0f9e2af3b0 2023-03-14 17:57:06 +01:00 Compare
Author
Member

@blender-bot build

@blender-bot build
Jacques Lucke added 5 commits 2023-03-14 19:35:04 +01:00
Jacques Lucke added 1 commit 2023-03-14 20:11:04 +01:00
buildbot/vexp-code-patch-coordinator Build done. Details
3400e55ba9
cleanup name
Author
Member

@blender-bot build

@blender-bot build
Jacques Lucke added 1 commit 2023-03-14 20:17:38 +01:00
buildbot/vexp-code-patch-coordinator Build done. Details
ff134afff8
Merge branch 'main' into temp-copy-on-write
Author
Member

@blender-bot build

@blender-bot build
Jacques Lucke added 1 commit 2023-03-14 20:36:11 +01:00
buildbot/vexp-code-patch-coordinator Build done. Details
2318d0a508
fix
Fixed this in the past at some point, but a merge conflict reverted it..
Author
Member

@blender-bot build

@blender-bot build
Jacques Lucke changed title from WIP: Core: use generic copy-on-write system to avoid redundant copies to Core: use generic copy-on-write system to avoid redundant copies 2023-03-14 21:00:22 +01:00
Jacques Lucke requested review from Brecht Van Lommel 2023-03-14 21:00:45 +01:00
Jacques Lucke requested review from Sergey Sharybin 2023-03-14 21:00:45 +01:00
Jacques Lucke requested review from Bastien Montagne 2023-03-14 21:00:45 +01:00
Jacques Lucke requested review from Hans Goudey 2023-03-14 21:00:45 +01:00
Hans Goudey added this to the Core project 2023-03-14 21:09:19 +01:00
Hans Goudey requested changes 2023-03-15 00:25:32 +01:00
Hans Goudey left a comment
Member

Looks very good! I have some comments about naming and comments and a couple other small things.

I did find a crash though. To reproduce, you can just use the grow/shrink brush on the "Random" curves from the add menu.

Looks very good! I have some comments about naming and comments and a couple other small things. I did find a crash though. To reproduce, you can just use the grow/shrink brush on the "Random" curves from the add menu.
@ -170,0 +161,4 @@
* Initializes a CustomData object with the same layers as source. The data is not copied from the
* source. Instead, the new layers are initialized using the given `alloctype`.
*/
void CustomData_copy_new(const struct CustomData *source,
Member

What about CustomData_copy_construct to make the naming consistent with typical RAII patterns? It would tell the reader that the dest customdata struct doesn't have to be initialized a bit better.

What about `CustomData_copy_construct` to make the naming consistent with typical RAII patterns? It would tell the reader that the `dest` customdata struct doesn't have to be initialized a bit better.
@ -2234,1 +2259,3 @@
if ((maxnumber != -1) && (number >= maxnumber)) {
if ((max_current_type_layer_count != -1) &&
(current_type_layer_count >= max_current_type_layer_count)) {
/* Don't merge this layer because the maximum amount of layers of this type is reached. */
Member

Wow, great comments here :D

Wow, great comments here :D
@ -2315,0 +2349,4 @@
* A #bCopyOnWrite that knows how to free the entire referenced custom data layer (including
* potentially separately allocated chunks like for vertex groups).
*/
class CustomDataLayerCOW : public bCopyOnWrite {
Member

I think it wasn't obvious why this was necessary when you explained it to me at first. A sentence explaining that like "because layer data might be passed outside of the custom data system" would be helpful.

I think it wasn't obvious why this was necessary when you explained it to me at first. A sentence explaining that like "because layer data might be passed outside of the custom data system" would be helpful.
@ -638,3 +638,3 @@
CustomData_free(&pointcloud->pdata, pointcloud->totpoint);
pointcloud->totpoint = me->totvert;
CustomData_merge(&me->vdata, &pointcloud->pdata, CD_MASK_PROP_ALL, CD_DUPLICATE, me->totvert);
Member

No need to add a blank line here IMO

No need to add a blank line here IMO
@ -0,0 +14,4 @@
/**
* #bCopyOnWrite allows implementing copy-on-write behavior, i.e. it allows sharing read-only data
* between multiple independend systems (e.g. meshes). The data is only copied when it is shared
Member

Typo: independend -> independent

Typo: `independend` -> `independent`
@ -0,0 +22,4 @@
* data only has a single owner and is mutable. If it is larger than 1, it is shared and must be
* logically const.
*
* On top of containing a reference count, #bCopyOnWrite also knows how to destruct the referenced
Member

On top of containing a -> In addition to containing the

Just a bit clearer (there's only one reference count per item) and reads a bit more naturally

`On top of containing a` -> `In addition to containing the` Just a bit clearer (there's only one reference count per item) and reads a bit more naturally
@ -0,0 +32,4 @@
* arrays (e.g. for mesh attributes).
* - It can be embedded into another struct. For that it's best to use #bCopyOnWriteMixin.
*/
struct bCopyOnWrite : blender::NonCopyable, blender::NonMovable {
Member

This is a C++ header now, and only used in C++ code, so I think it would make sense to:

  • Use the blender:: namespace like other blenlib classes
  • Remove the b prefix which becomes redundant with the namespace.
This is a C++ header now, and only used in C++ code, so I think it would make sense to: - Use the `blender::` namespace like other blenlib classes - Remove the `b` prefix which becomes redundant with the namespace.
@ -0,0 +10,4 @@
namespace blender {
template<typename T> class COWUser {
Member

I guess COWUser should keep the old comment above UserCounter and maybe say "this adds RAII semantics on top of the copy-on-write logic from #bCopyOnWrite."

I guess `COWUser` should keep the old comment above `UserCounter` and maybe say "this adds RAII semantics on top of the copy-on-write logic from #bCopyOnWrite."
@ -1689,0 +1694,4 @@
return;
}
MemFile &memfile = *writer->wd->mem.written_memfile;
if (memfile.cow_storage == nullptr) {
Member

Maybe it doesn't matter, but maybe it would be reasonable to just create this once at the start of the writing process? It's likely to be used, and it would save the check for every write call. OTOH I'm sure this isn't a bottleneck!

Maybe it doesn't matter, but maybe it would be reasonable to just create this once at the start of the writing process? It's likely to be used, and it would save the check for every write call. OTOH I'm sure this isn't a bottleneck!
Sergey Sharybin requested changes 2023-03-15 12:17:41 +01:00
Sergey Sharybin left a comment
Owner

The goal sounds great!

There are some topics, without really particular order.

Performance

In the goal you states memory usage and performance improvements. But in the conclusion part where you show the improvements you are only focusing on the timing information.

How about memory usage?
It is quite weird to not have such metric measured in the context of copy-on-write (which is a resource-sharing technique, with a goal to save memory by avoiding creation of unneeded copies).

In the context topic raised in the #95845 one would expect actually measurable memory improvement. From the dependency graph point of view, the memory consumption is about 2x for meshes which do not have any modifiers. This is because the dependency graph "de-couples" the evaluated mesh from the original, so that the render thread can access it and modify, while artist can modify the original mesh in viewport, without causing any conflicts. Sharing the heavy CustomData arrays between the evaluated state and the original until they gets modified should give very good memory save.

Measuring the difference could be a bit more challenging, and might not be 2x when comparing against 2.79, but is still should be measurable improvement.

The biggest saving is expected to have on a scene with a lot of meshes with modifiers disabled. This is how the layout and some animation files are set up for the performance reasons. For the simplicity of memory tracking you can fake such setup by creating a mesh with few tens of million of vertices (do it as a base mesh, do not have modifiers).

Naming

The copy-on-write is a technique, which takes care of efficient copy/sharing of a modifiable resource. I do not think having it stated in types and variables really showing intent in the best possible way.

To me the more correct name for bCopyOnWriteMixin seems could be bShareableMixing, bImplicitlyShareableMixin, bShareableResourceMixin. It more directly showing the intent of it.

I know in a lot of places there is already cow all over the place. But this is already confusing, and even violating the initial design where the proper naming is to use orig_ and eval_.

API

I did not have time to check things in details yet, and was mainly looking into the BKE_customdata.h, as it is something that many developers interface with, and it was already hard to follow.

Maybe some of the points belong to an inlined comment, but some of them affect multiple functions, so seems to make sense to come to a general consensus.

The new suffix is something that confuses me. Not sure how it is expected to behave if CD_ASSIGN is passes as an alloctype, and maybe naming could be better.

If the CD_ASSIGN is purely internal-only use, maybe there are ways to make it more explicit like so. Maybe it can be moved away from header to an implementation file? Maybe renamed to CD_ASSIGN_INTERNAL?

For the new suffix maybe something like construct_alike? To kinda show it explicitly in the name that it is only "topology" of layers is considered, and not the content.

Also not really sure the difference between CustomData_merge_new and CustomData_copy_new. Some comment/information seems to be missing. Is it than the latter considers the dest to be completely un-initialized?

Another point is the const struct bCopyOnWrite *cow. Either the naming is to be changed, or the definition of the structure.

In a more ideal world, you'd imagine that you pass buffer as a single argument, so that you are not risking "stripping" user-counter away by accident. In the terms of C this will be something like

struct bShareableHeader {
  int64_t user_counter;
};

struct CustomDataLayerData {
  bShareableHeader shareable_header;

  void *data_ptr;
};

and somehow pass CustomDataLayerData as a single argument to the _with_data functions.

File IO and Undo

It would indeed eventually be nice to improve memory usage of undo, doing such intrusive changed (as per the PR description) seems a bit odd to do so early on, and will only make the review even longer and more complicated.

What are the consequences of removing BLO_memfile_write_file on the auto-save?

This is something that is not trivial to guess, and there is potential to cause huge usability problems.

On a quick investigation, it seems that before this change the memfile will always be used, even when in the sculpt mode (the use_memfile depends on the user setting, which does not change in the sculpt mode, and there is no warning printed in the console either).

The benefit of this approach is that the auto-save is fast, and does not block the UI, disturbing artists. The downside is that the sculpt changes done in the current sculpt session are not saved in the autosave.

After this change, the ED_editors_flush_edits is forced to happen. This is not cheap operation when you sculpt on a high-res object. Locking UI for a few seconds at regular time intervals is not something artists will appreciate.


Is there a way to split this change into smaller achivable and verifyable steps? For example, is it possible to get the memory optimization of the initial Mesh copy for the depsgraph's CoW without opening the story of the undo? Maybe even smaller pieces are possible (you mentioned making the CustomData const-correct, i.e.)?

Currently the scope seems to be too big, and there are already big topics which seems to be neglected (at least as per the patch description).

The goal sounds great! There are some topics, without really particular order. **Performance** In the goal you states memory usage and performance improvements. But in the conclusion part where you show the improvements you are only focusing on the timing information. How about memory usage? It is quite weird to not have such metric measured in the context of copy-on-write (which is a resource-sharing technique, with a goal to save memory by avoiding creation of unneeded copies). In the context topic raised in the #95845 one would expect actually measurable memory improvement. From the dependency graph point of view, the memory consumption is about 2x for meshes which do not have any modifiers. This is because the dependency graph "de-couples" the evaluated mesh from the original, so that the render thread can access it and modify, while artist can modify the original mesh in viewport, without causing any conflicts. Sharing the heavy CustomData arrays between the evaluated state and the original until they gets modified should give very good memory save. Measuring the difference could be a bit more challenging, and might not be 2x when comparing against 2.79, but is still should be measurable improvement. The biggest saving is expected to have on a scene with a lot of meshes with modifiers disabled. This is how the layout and some animation files are set up for the performance reasons. For the simplicity of memory tracking you can fake such setup by creating a mesh with few tens of million of vertices (do it as a base mesh, do not have modifiers). **Naming** The copy-on-write is a technique, which takes care of efficient copy/sharing of a modifiable resource. I do not think having it stated in types and variables really showing intent in the best possible way. To me the more correct name for `bCopyOnWriteMixin` seems could be `bShareableMixing`, `bImplicitlyShareableMixin`, `bShareableResourceMixin`. It more directly showing the intent of it. I know in a lot of places there is already `cow` all over the place. But this is already confusing, and even violating the initial design where the proper naming is to use `orig_` and `eval_`. **API** I did not have time to check things in details yet, and was mainly looking into the `BKE_customdata.h`, as it is something that many developers interface with, and it was already hard to follow. Maybe some of the points belong to an inlined comment, but some of them affect multiple functions, so seems to make sense to come to a general consensus. The `new` suffix is something that confuses me. Not sure how it is expected to behave if `CD_ASSIGN` is passes as an `alloctype`, and maybe naming could be better. If the `CD_ASSIGN` is purely internal-only use, maybe there are ways to make it more explicit like so. Maybe it can be moved away from header to an implementation file? Maybe renamed to `CD_ASSIGN_INTERNAL`? For the `new` suffix maybe something like `construct_alike`? To kinda show it explicitly in the name that it is only "topology" of layers is considered, and not the content. Also not really sure the difference between `CustomData_merge_new` and `CustomData_copy_new`. Some comment/information seems to be missing. Is it than the latter considers the `dest` to be completely un-initialized? Another point is the `const struct bCopyOnWrite *cow`. Either the naming is to be changed, or the definition of the structure. In a more ideal world, you'd imagine that you pass buffer as a single argument, so that you are not risking "stripping" user-counter away by accident. In the terms of C this will be something like ``` struct bShareableHeader { int64_t user_counter; }; struct CustomDataLayerData { bShareableHeader shareable_header; void *data_ptr; }; ``` and somehow pass `CustomDataLayerData` as a single argument to the `_with_data` functions. **File IO and Undo** It would indeed eventually be nice to improve memory usage of undo, doing such intrusive changed (as per the PR description) seems a bit odd to do so early on, and will only make the review even longer and more complicated. What are the consequences of removing `BLO_memfile_write_file` on the auto-save? This is something that is not trivial to guess, and there is potential to cause huge usability problems. On a quick investigation, it seems that before this change the memfile will always be used, even when in the sculpt mode (the `use_memfile` depends on the user setting, which does not change in the sculpt mode, and there is no warning printed in the console either). The benefit of this approach is that the auto-save is fast, and does not block the UI, disturbing artists. The downside is that the sculpt changes done in the current sculpt session are not saved in the autosave. After this change, the `ED_editors_flush_edits` is forced to happen. This is not cheap operation when you sculpt on a high-res object. Locking UI for a few seconds at regular time intervals is not something artists will appreciate. ---- Is there a way to split this change into smaller achivable and verifyable steps? For example, is it possible to get the memory optimization of the initial Mesh copy for the depsgraph's CoW without opening the story of the undo? Maybe even smaller pieces are possible (you mentioned making the CustomData const-correct, i.e.)? Currently the scope seems to be too big, and there are already big topics which seems to be neglected (at least as per the patch description).
Author
Member

Thanks for the feedback!

You're totally right that this should be split up a bit more. The patch started out as an experiment to see where copy-on-write can be used, so I added everything at once and just didn't think of splitting it up later on.

I suggest the following steps:

  1. Blenlib changes + changes specific to geometry nodes (use the new type in GeometryComponent and AnonymousAttributeID).
  2. CustomData integration.
  3. Undo integration.

Does that sound reasonable to you? If yes, then I'll split things up this way. Will respond to your more specific comments in the respective pull requests then.


To me the more correct name for bCopyOnWriteMixin seems could be bShareableMixing, bImplicitlyShareableMixin, bShareableResourceMixin. It more directly showing the intent of it.

Renaming bCopyOnWrite to ShareableResource is fine with me (same for the Mixin type). Maybe something like SharedResourceInfo or so could work as well, the name makes it a bit more clear that the type is not necessarily the resource itself, but tracks a potentially separately allocated resource (like an attribute array).

Thanks for the feedback! You're totally right that this should be split up a bit more. The patch started out as an experiment to see where copy-on-write can be used, so I added everything at once and just didn't think of splitting it up later on. I suggest the following steps: 1. Blenlib changes + changes specific to geometry nodes (use the new type in `GeometryComponent` and `AnonymousAttributeID`). 2. `CustomData` integration. 3. Undo integration. Does that sound reasonable to you? If yes, then I'll split things up this way. Will respond to your more specific comments in the respective pull requests then. ----- > To me the more correct name for `bCopyOnWriteMixin` seems could be `bShareableMixing`, `bImplicitlyShareableMixin`, `bShareableResourceMixin`. It more directly showing the intent of it. Renaming `bCopyOnWrite` to `ShareableResource` is fine with me (same for the `Mixin` type). Maybe something like `SharedResourceInfo` or so could work as well, the name makes it a bit more clear that the type is not necessarily the resource itself, but tracks a potentially separately allocated resource (like an attribute array).
Author
Member

For the simplicity of memory tracking you can fake such setup by creating a mesh with few tens of million of vertices (do it as a base mesh, do not have modifiers).

As a side note, I just tested this with a heavily subdivided cube. In main the memory usage is 2.5 GB and in the branch 1.13 GB as shown below (also approximately matches what htop shows me).

image

> For the simplicity of memory tracking you can fake such setup by creating a mesh with few tens of million of vertices (do it as a base mesh, do not have modifiers). As a side note, I just tested this with a heavily subdivided cube. In `main` the memory usage is 2.5 GB and in the branch 1.13 GB as shown below (also approximately matches what `htop` shows me). ![image](/attachments/3365b87c-b686-4b01-ab49-4d3749599ce6)
132 KiB
Brecht Van Lommel requested changes 2023-03-15 19:00:27 +01:00
Brecht Van Lommel left a comment
Owner

Splitting this up would indeed help reviewing, I haven't looked closely since it's a bit difficult to get the big picture and I agree with the points Sergey makes.

The quit.blend saving changes could also be reviewed separately from the other undo changes.

For naming, I think this should be called either "copy on write" or "implicit sharing" since that's the standard terminology. Just shared / shareable seems not distinguished enough for shared_ptr.

Splitting this up would indeed help reviewing, I haven't looked closely since it's a bit difficult to get the big picture and I agree with the points Sergey makes. The quit.blend saving changes could also be reviewed separately from the other undo changes. For naming, I think this should be called either "copy on write" or "implicit sharing" since that's the standard terminology. Just shared / shareable seems not distinguished enough for `shared_ptr`.
@ -0,0 +10,4 @@
namespace blender {
template<typename T> class COWUser {

I would not abbreviate to COW but use CopyOnWrite, or whichever name we end up choosing.

I would not abbreviate to COW but use CopyOnWrite, or whichever name we end up choosing.

@JacquesLucke The plan sounds good.

The memory saving is fantastic!

@JacquesLucke @HooglyBoogly Any objections of sticking to implicit sharing?

To me it seems to be the most clear from both showing intent, disambiguating it from the name of a technique, and solves the possible confusion with shared_ptr.

P.S. The latter one I personally don't really have issues with. To me the bCopyOnWrite (how it is currently called and intended to be used anyway) is the same as shared_ptr when it comes to the life management and assignment. It "just" adds some extra call to "un-share" the resource. But it is important to keep such fundamental data types and APIs as clear as possible and intuitively understood by everyone.

@JacquesLucke The plan sounds good. The memory saving is fantastic! @JacquesLucke @HooglyBoogly Any objections of sticking to `implicit sharing`? To me it seems to be the most clear from both showing intent, disambiguating it from the name of a technique, and solves the possible confusion with `shared_ptr`. P.S. The latter one I personally don't really have issues with. To me the `bCopyOnWrite` (how it is currently called and intended to be used anyway) is the same as `shared_ptr` when it comes to the life management and assignment. It "just" adds some extra call to "un-share" the resource. But it is important to keep such fundamental data types and APIs as clear as possible and intuitively understood by everyone.
Author
Member

I didn't use the term implicit sharing before and didn't come across it much, but can get used to it. Based on Wikipedia the term indeed seems to mean what we want, so I'm fine with that.

I didn't use the term `implicit sharing` before and didn't come across it much, but can get used to it. Based on [Wikipedia](https://en.wikipedia.org/wiki/Copy-on-write) the term indeed seems to mean what we want, so I'm fine with that.
Jacques Lucke changed title from Core: use generic copy-on-write system to avoid redundant copies to WIP: Core: use generic copy-on-write system to avoid redundant copies 2023-03-20 09:54:44 +01:00
Brecht Van Lommel requested changes 2023-03-24 13:04:40 +01:00
@ -0,0 +63,4 @@
void remove_user_and_delete_if_last() const
{
const int old_user_count = users_.fetch_sub(1, std::memory_order_relaxed);

My understanding is that user count decrement requires std::memory_order_acq_rel, while increment can use std::memory_order_relaxed.

For example libc++ does this for shared_ptr. More details:
https://stackoverflow.com/questions/48124031/stdmemory-order-relaxed-atomicity-with-respect-to-the-same-atomic-variable

My understanding is that user count decrement requires `std::memory_order_acq_rel`, while increment can use `std::memory_order_relaxed`. For example libc++ does this for `shared_ptr`. More details: https://stackoverflow.com/questions/48124031/stdmemory-order-relaxed-atomicity-with-respect-to-the-same-atomic-variable
brecht marked this conversation as resolved
Brecht Van Lommel reviewed 2023-03-24 13:06:26 +01:00
@ -0,0 +63,4 @@
void remove_user_and_delete_if_last() const
{
const int old_user_count = users_.fetch_sub(1, std::memory_order_relaxed);

Oops, this was meant to go to #105994.

Oops, this was meant to go to #105994.
brecht marked this conversation as resolved
Author
Member

A patch that only contains the CustomData integration is available here: #106228.

A patch that only contains the `CustomData` integration is available here: #106228.
Jacques Lucke closed this pull request 2023-04-08 13:43:34 +02:00
Bastien Montagne removed this from the Core project 2023-07-03 12:49:36 +02:00
Some checks failed
buildbot/vexp-code-patch-coordinator Build done.

Pull request closed

Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#104478
No description provided.