WIP: Core: use generic copy-on-write system to avoid redundant copies #104478

Jacques Lucke · 2023-02-08T18:50:35+01:00

Jacques Lucke commented

2023-02-08 18:50:35 +01:00

This patch adds copy-on-write support in a few places. The main goal is to avoid copying data unnecessarily, improving memory usage and performance. For more details, see #95845. This patch is only the culmination of a series of patches mostly @HooglyBoogly worked on. We had to make accessing data in CustomData const-correct by introducing the _for_write functions when accessing layers. Furthermore, the embedded pointers to some special layers like Mesh->mvert had to be deprecated.

This patch includes:

CustomData refactor that allows sharing attribute layers. To properly implement this, the API had to be changed a bit as well.
Integration with the undo system, which fixes #98574.
GeometryComponent uses the same copy-on-write system, instead of what it used before.
AnonymousAttributeID also uses the system, but doesn't really have to (because the data is immutable anyway). It was just easier to use the same system for now.

Possible future uses:

Take ownership of attribute arrays in cycles instead of copying them. This might required changes to the Attribute class in cycles.
Avoid copying arrays to create a GPUVertBuf.

Notes:

This removes BLO_memfile_write_file because the entire file is not serialized for an undo step anymore. Serializing the rest on demand is not really trivial (e.g. for vertex groups). I think it was generally a nice thing that an undo step could be written to disk, but it does not feel absolutely necessary if it gets in the way of better run-time performance. The consequence is that quit.blend is now written using normal file-save and that USE_WRITE_CRASH_BLEND has been removed (never knew it existed before, was someone using that?).
The changes I did in editmesh_undo.cc work right now, but are a bit brittle because they make assumptions about where the data comes from. It should be possible to improve the situation there a bit when layers are not partially freed anymore (currently MEM_freeN(layer->data); only frees the pointer array of e.g. vertex groups).

Performance:

Compared to before, performance should generally be the same or better, depending on what the bottleneck is. I didn't measure a performance regression in the production files I tested.
It is relatively straight forward to build an example file that shows an arbitrarily large speedup, so putting numbers to it are somewhat meaningless. For reference I attached one .blend file that shows that copy-on-write is behaving as expected. The Object Info node in geometry nodes retrieves the geometry from another object and modifies only the position. All other attributes are shared. The modifier evaluation on object B takes about 30ms in main and 3ms with this patch applied.

This patch adds copy-on-write support in a few places. The main goal is to avoid copying data unnecessarily, improving memory usage and performance. For more details, see #95845. This patch is only the culmination of a series of patches mostly @HooglyBoogly worked on. We had to make accessing data in `CustomData` const-correct by introducing the `_for_write` functions when accessing layers. Furthermore, the embedded pointers to some special layers like `Mesh->mvert` had to be deprecated. This patch includes: * `CustomData` refactor that allows sharing attribute layers. To properly implement this, the API had to be changed a bit as well. * Integration with the undo system, which fixes #98574. * `GeometryComponent` uses the same copy-on-write system, instead of what it used before. * `AnonymousAttributeID` also uses the system, but doesn't really have to (because the data is immutable anyway). It was just easier to use the same system for now. Possible future uses: * Take ownership of attribute arrays in cycles instead of copying them. This might required changes to the `Attribute` class in cycles. * Avoid copying arrays to create a `GPUVertBuf`. Notes: * This removes `BLO_memfile_write_file` because the entire file is not serialized for an undo step anymore. Serializing the rest on demand is not really trivial (e.g. for vertex groups). I think it was generally a nice thing that an undo step could be written to disk, but it does not feel absolutely necessary if it gets in the way of better run-time performance. The consequence is that `quit.blend` is now written using normal file-save and that `USE_WRITE_CRASH_BLEND` has been removed (never knew it existed before, was someone using that?). * The changes I did in `editmesh_undo.cc` work right now, but are a bit brittle because they make assumptions about where the data comes from. It should be possible to improve the situation there a bit when layers are not partially freed anymore (currently `MEM_freeN(layer->data);` only frees the pointer array of e.g. vertex groups). Performance: * Compared to before, performance should generally be the same or better, depending on what the bottleneck is. I didn't measure a performance regression in the production files I tested. * It is relatively straight forward to build an example file that shows an arbitrarily large speedup, so putting numbers to it are somewhat meaningless. For reference I attached one .blend file that shows that copy-on-write is behaving as expected. The `Object Info` node in geometry nodes retrieves the geometry from another object and modifies only the position. All other attributes are shared. The modifier evaluation on object `B` takes about 30ms in `main` and 3ms with this patch applied.

cow_test.blend

827 KiB

❤️ 2 🎉 4 🚀 9

Jacques Lucke added 50 commits 2023-02-08 18:50:37 +01:00

e0ac932156 initial data structure

1713a780c5 progress

c5b55a84f5 cleanup

3f3adef4a0 Merge branch 'master' into temp-copy-on-write

ac439c7eed Use new copy-on-write system for sharing geometry components.

Differential Revision: https://developer.blender.org/D14139

dd8af7b46a Merge branch 'master' into temp-copy-on-write

6b1028d492 fix

d76e6862f0 fix

0b7155b5c5 fix

5a3c8a1b12 change custom data api for cow

ced0021498 Merge branch 'master' into temp-copy-on-write

51388d401c Merge branch 'temp-copy-on-write' into temp-copy-on-write-customdata

5329d5803e progress

3ed5d00f7e fix

e3779d12d5 progress

c4c9a43072 progress

3f22933534 progress

c1d1665a79 progress

3530c63611 progress

be0fe0433d cleanup

b592d76a8e remove CD_FLAG_NOFREE

b45b522d22 progress

8cff18da91 progress

66b4840d90 fix freeing

ba33242d9a progress

d4ec435d4f compile fixes

2b71cf4b31 Merge branch 'master' into temp-copy-on-write-customdata

ef1f42865e Merge branch 'master' into temp-copy-on-write-customdata

097c085a4e Support copy-on-write in undo system

c678fe09e4 cleanup

c9477aac6a Merge branch 'master' into temp-copy-on-write-customdata

8f6b51ac15 cleanup

ee675db8f6 store destructor in cow type

53178b94e0 initial rna integration

6065a378cf use virtual class

a16a60f600 cleanup

60612b84dc cleanup

b703fc76c8 cleanup

6f4772d170 cleanup file structure

b4ac85084c improve rna api

2e9fcdc600 disable array store for editmesh undo

4d67393e7e Merge branch 'master' into temp-copy-on-write-customdata

a59b848992 fix

1284bdc24c removing support for saving undo step as actual file

9c51e093a3 partial fix

ff6f1a0a70 simplify embedding cow

e71021db06 Merge branch 'master' into temp-copy-on-write-customdata

ce46579108 Merge branch 'main' into temp-copy-on-write-customdata

82ffc03449 fix

06e1b3f9f6 Merge branch 'main' into temp-copy-on-write-customdata

Jacques Lucke added 41 commits 2023-02-10 17:01:33 +01:00

buildbot/vdev-code-daily-coordinator Build done. Details

a1282ab015 Fix Cycles debug build error after host falback changes

Introduced in dcfb6df9ce6.

Co-authored-by: Lucas Tadeu Teixeira <lucas@lucastadeu.com>

Pull Request #104454

buildbot/vdev-code-daily-coordinator Build done. Details

0ab3ac7a41 BLI: Math: Fix vector operator * with `MutableMatView`

This was caused by operator priority trying to use
`friend VecBase operator*(const VecBase &a, FactorT b)`.

Adding tests as these were not covered.

buildbot/vdev-code-daily-coordinator Build done. Details

a0f5240089 EEVEE-Next: Virtual Shadow Map initial implementation

Implements virtual shadow mapping for EEVEE-Next primary shadow solution.
This technique aims to deliver really high precision shadowing for many
lights while keeping a relatively low cost.

The technique works by splitting each shadows in tiles that are only
allocated & updated on demand by visible surfaces and volumes.
Local lights use cubemap projection with mipmap level of detail to adapt
the resolution to the receiver distance.
Sun lights use clipmap distribution or cascade distribution (depending on
which is better) for selecting the level of detail with the distance to
the camera.

Current maximum shadow precision for local light is about 1 pixel per 0.01
degrees.
For sun light, the maximum resolution is based on the camera far clip
distance which sets the most coarse clipmap.

## Limitation:
Alpha Blended surfaces might not get correct shadowing in some corner
casses. This is to be fixed in another commit.
While resolution is greatly increase, it is still finite. It is virtually
equivalent to one 8K shadow per shadow cube face and per clipmap level.
There is no filtering present for now.

## Parameters:
Shadow Pool Size: In bytes, amount of GPU memory to dedicate to the
shadow pool (is allocated per viewport).
Shadow Scaling: Scale the shadow resolution. Base resolution should
target subpixel accuracy (within the limitation of the technique).

Related to #93220
Related to #104472

buildbot/vdev-code-daily-coordinator Build done. Details

9c03a1c92f Fix Cycles link error with debug/asan builds after recent bugfix

Pull Request #104487

buildbot/vdev-code-daily-coordinator Build done. Details

9103978952 EEVEE-Next: Shadow: Fix issue with last merge

The merge with master updated the code to use the new matrix API. This
introduce some regressions.

For sunlights make sure there is enough tilemaps in orthographic mode
to cover the depth range and fix the level offset in perspective.

buildbot/vdev-code-daily-coordinator Build done. Details

94d280fc3f EEVEE-Next: Shadows: Add global switch

This allow to bypass all cost associated with shadow mapping.

This can be useful in certain situation, such as opening a scene on a
lower end system or just to gain performance in some situation (lookdev).

9fd71d470e PyAPI: minor change to rna_manual_reference loading

- Use bpy.utils.execfile instead of importing then deleting from
  sys.modules.
- Add a note for why keeping this cached in memory isn't necessary.

This has the advantage of not interfering with any scripts that import
`rna_manual_reference` as a module.

5b110548eb Cleanup: enum conversion compiler warnings

5f842ef336 Cleanup: spelling in comments

buildbot/vdev-code-daily-coordinator Build done. Details

0381fe7bfe Cleanup: update username in code-comments: campbellbarton -> ideasman42

Gitea migration changed my username, update code-comments.

buildbot/vdev-code-daily-coordinator Build done. Details

3c8f7b1a64 Cleanup: Remove unused/redundant includes from BKE_curves.hh

Avoid including headers that are obviously redundant, and don't
include BLI_task.hh in the header file, since it isn't really related.

buildbot/vdev-code-daily-coordinator Build done. Details

f3d7de709f Cycles: update Intel Graphics Compiler to 1.0.13064.7 on Linux

Linux side of 8afcecdf1f.

Reviewed by: LazyDodo, sergey, campbellbarton
Ref !104458, 16984

buildbot/vdev-code-daily-coordinator Build done. Details

7effc6ffc4 Cleanup: solve compiler warnings.

Classes were predefined as structs.

1883e782cb Spelling: Assert message in GPU_texture_read.

buildbot/vdev-code-daily-coordinator Build done. Details

8b35db914e GPU: Fix assert when using light gizmo.

Blender was reporting that the GPU_TEXTURE_USAGE_HOST_READ wasn't set.
This is used to indicate that the textures needs to be read back to
CPU. Textures that don't need to be read back can be optimized by the
GPU backend.

Found during investigation of #104282.

buildbot/vdev-code-daily-coordinator Build done. Details

f222fe6a3a Build: enable Python optimizations (PGO & LTO) on Linux

This is used for most Python release builds and has been reported to
give a modest 5-10% speedup (depending on the workload).

This could be enabled on macOS too but needs to be tested.

buildbot/vdev-code-daily-coordinator Build done. Details

0e196bab76 Build: disable LTO for Python builds

LTO compiled libpython3.10.a failed to link with GCC 12.0,
disable since these libraries are intended for developers to link
against.

buildbot/vdev-code-daily-coordinator Build done. Details

ca183993a5 Fix freeing uninitialized pointer in GHOST/Wayland + X11 fallback

Freeing the timer manager didn't account for Wayland being partially
initialized.

buildbot/vdev-code-daily-coordinator Build done. Details

666c2ea012 Refactor: remove yscale from bAnimContext

`bAnimContext` had a float property called `yscale_fac` that was used to define the height of the keyframe channels.

However the property was never set, only read so there really is no need to have it in the struct.

Moreover it complicated getting the channel height because `bAnimContext` had to be passed in.

Speaking of getting the channel height. This was done with macros. I ripped them all out and replaced them with function calls.

Originally it was introduced in this patch: https://developer.blender.org/rB095c8dbe6919857ea322b213a1e240161cd7c843

Co-authored-by: Christoph Lendenfeld <chris.lenden@gmail.com>
Pull Request #104500

22edf04458 I18n: use format strings for Cycles version error messages

The required version numbers for various devices was hardcoded in the
UI messages. The result was that every time one of these versions was
bumped, every language team had to update the message in question.

Instead, the version numbers can be extracted, and injected into the
error messages using string formatting so that translation updates
need happen less frequently.

Pull Request #104488

3bed78ff59 Curves: Add box selection

This adds a `select_box` function for the `Curves` object. It is used in the `view3d_box_select` operator.

It also adds the basic selection tools in the toolbar of Edit Mode.

Authored-by: Falk David <falkdavid@gmx.de>
Pull Request #104411

7ca651d182 Mesh: Remove unnecessary edge draw flag

As described in #95966, replace the `ME_EDGEDRAW` flag with a bit
vector in mesh runtime data. Currently the the flag is only ever set
to false for the "optimal display" feature of the subdivision surface
modifier. When creating an "original" mesh in the main data-base,
the flag is always supposed to be true.

The bit vector is now created by the modifier only as necessary, and
is cleared for topology-changing operations. This fixes incorrect
interpolation of the flag as noted in #104376. Generally it isn't
possible to interpolate it through topology-changing operations.

After this, only the seam status needs to be removed from edges before
we can replace them with the generic `int2` type (or something similar)
and reduce memory usage by 1/3.

Related:
- 10131a6f62
- 145839aa42

In the future `BM_ELEM_DRAW` could be removed as well. Currently it is
used and aliased by other defines in some non-obvious ways though.

Pull Request #104417

b8e15a4a84 Fix: Add missing "-" in logic to get the channel height

This was missed when doing the refactoring in #104500
It didn't seem to have any effect until I worked on clamping the view

buildbot/vdev-code-daily-coordinator Build done. Details

bfa7f9db0e Assets: Implement viewport drag and drop for geometry nodes

Currently there's no way to assign a geometry node group from the asset
browser to an object as a modifier without first appending/linking it
manually. This patch adds a drag and drop operator that adds a new
modifier and assigns the dragged tree.

Pull Request #104430

buildbot/vdev-code-daily-coordinator Build done. Details

50dfd5f501 Geometry Nodes: Edges to Face Groups Node

Add a new node that groups faces inside of boundary edge regions.
This is the opposite action as the existing "Face Group Boundaries"
node. It's also the same as some of the "Initialize Face Sets"
options in sculpt mode.

Discussion in #102962 has favored "Group" for a name for these
sockets rather than "Set", so that is used here.

Pull Request #104428

buildbot/vdev-code-daily-coordinator Build done. Details

1649921791 Fix: Sequencer "Pan" label using incorrect keyword 'heading_ctxt'

Oversight in db87e2a638

Reviewed By: ISS
Differential Revision: https://archive.blender.org/developer/D17213

buildbot/vdev-code-daily-coordinator Build done. Details

50918d44fb Cleanup: Fix const correctness warning in recent commit

bc0d3c91b1 Fix #104435 : Fix rna_NlaStrip_new add strip logic to be correct boolean expression

Fixed #104435: Use correct conditional logic when testing if a new NLA strip can be added in the rna_NlaStrip_new method

buildbot/vdev-code-daily-coordinator Build done. Details

2cfc4d7644 Fix #104383 : don't update declaration for clipboard copy

When nodes are copied to the clipboard, they don't need their declaration.
For nodes with dynamic declaration that might depend on the node tree itself,
the declaration could not be build anyway, because the node-clipboard does
not have a node tree.

Pull Request #104432

buildbot/vdev-code-daily-coordinator Build done. Details

5c8edbd99b Cleanup: Move 6 sculpt-session-related files and header to C++

To allow further mesh data structure refactoring. See #103343

Pull Request #104540

buildbot/vdev-code-daily-coordinator Build done. Details

7e0e07657c GPU: Cleanup GPU_batch.h documentation and some of the API for consistency

Documented all functions, adding use case and side effects.

Also replace the use of shortened argument name by more meaningful ones.

Renamed `GPU_batch_instbuf_add_ex` and `GPU_batch_vertbuf_add_ex` to remove
the `ex` suffix as they are the main version used (removed the few usage
of the other version).

Renamed `GPU_batch_draw_instanced` to `GPU_batch_draw_instance_range` and
make it consistent with `GPU_batch_draw_range`.

2d351e9ee3 Cleanup: format

2ee9c12a23 PyAPI: add bpy.utils.manual_locale_code()

Move the function for getting the language code associated with the
user manual into a utility function (from the generated
rna_manual_reference.py).

This allows other parts of Blender to create a manual URL based on the
current locale preferences and environment.

Ref !104494

8ac3096e24 Fix add-on & manual link in Help menu ignoring the current language

Use bpy.utils.manual_language_code() create manual URL's instead of
assuming English.

48d9363fa7 Cleanup: quiet clang compiler warnings

- undeclared variable warning.
- unreachable-code-return warnings.
- array-parameter, mismatch bound.
- 'requires' is a keyword in C++20, (rename to requires_flag).

buildbot/vdev-code-daily-coordinator Build done. Details

4cbe0bff34 Cleanup: spelling in comments

a8d951abdd Docs: remove malformed patterns for RNA mapping

The generator now skips these with a warning, they will need to be
corrected in the user manual.

This caused tests/python/bl_rna_manual_reference.py to fail looking
up URL's.

buildbot/vdev-code-daily-coordinator Build done. Details

c2c62c3618 RNA: return a dummy language value when WITH_INTERNATIONAL=OFF

Without this, every access to "language" would warn that the enum
value didn't match a value in the enum items.

This made the bl_rna_manual_reference.py test output practically
unusable.

buildbot/vdev-code-daily-coordinator Build done. Details

b77c82e2bb Tests: minor updates to make bl_rna_manual_reference more useful

- Avoid flooding the output with every match that succeeds.
- Report patterns listed in the manual that don't match anything in
  Blender.
- Disable external URL lookups, this is too slow.
  Instead use a LOCAL_PREFIX (a local build of the manual)
  or skip the test.

buildbot/vdev-code-daily-coordinator Build done. Details

01480229b1 Cycles: Fix MetalRT checkbox not hooked up to device on AMD

(Follow on from D17043)
On AMD Navi2 devices the MetalRT checkbox was not hooked up properly and had no effect. This patch fixes it.

Co-authored-by: Michael Jones <michael_p_jones@apple.com>
Pull Request #104520

80f34eaa4d Merge branch 'main' into temp-copy-on-write

Jacques Lucke changed title from ~~WIP: use generic copy-on-write system to avoid redundant copies~~ to WIP: Core: use generic copy-on-write system to avoid redundant copies

2023-03-08 14:12:51 +01:00

Jacques Lucke commented

2023-03-08 14:51:13 +01:00

@blender-bot build

Jacques Lucke referenced this pull request

2023-03-13 10:58:34 +01:00

CustomData: add separate function to add layer from existing data #105708

Jacques Lucke referenced this issue from a commit

2023-03-14 15:30:39 +01:00

CustomData: add separate function to add layer from existing data

Jacques Lucke force-pushed temp-copy-on-write from c358271b03 to 0f9e2af3b0

2023-03-14 17:57:06 +01:00

Compare

Jacques Lucke commented

2023-03-14 18:08:33 +01:00

@blender-bot build

Jacques Lucke added 5 commits 2023-03-14 19:35:04 +01:00

39d770fae4 cleanup

27061b704b cleanup

73aa98c62f cleanup

5f4ca0ca2c cleanup

36a18db75a rename

Jacques Lucke added 1 commit 2023-03-14 20:11:04 +01:00

buildbot/vexp-code-patch-coordinator Build done. Details

3400e55ba9 cleanup name

Jacques Lucke commented

2023-03-14 20:12:25 +01:00

@blender-bot build

Jacques Lucke added 1 commit 2023-03-14 20:17:38 +01:00

buildbot/vexp-code-patch-coordinator Build done. Details

ff134afff8 Merge branch 'main' into temp-copy-on-write

Jacques Lucke commented

2023-03-14 20:17:39 +01:00

@blender-bot build

Jacques Lucke added 1 commit 2023-03-14 20:36:11 +01:00

buildbot/vexp-code-patch-coordinator Build done. Details

2318d0a508 fix

Fixed this in the past at some point, but a merge conflict reverted it..

Jacques Lucke commented

2023-03-14 20:38:09 +01:00

@blender-bot build

Jacques Lucke changed title from ~~WIP: Core: use generic copy-on-write system to avoid redundant copies~~ to Core: use generic copy-on-write system to avoid redundant copies

2023-03-14 21:00:22 +01:00

Jacques Lucke requested review from Brecht Van Lommel 2023-03-14 21:00:45 +01:00

Jacques Lucke requested review from Sergey Sharybin 2023-03-14 21:00:45 +01:00

Jacques Lucke requested review from Bastien Montagne 2023-03-14 21:00:45 +01:00

Jacques Lucke requested review from Hans Goudey 2023-03-14 21:00:45 +01:00

Hans Goudey added this to the Core project 2023-03-14 21:09:19 +01:00

Hans Goudey requested changes 2023-03-15 00:25:32 +01:00

Hans Goudey left a comment

Looks very good! I have some comments about naming and comments and a couple other small things.

I did find a crash though. To reproduce, you can just use the grow/shrink brush on the "Random" curves from the add menu.

Looks very good! I have some comments about naming and comments and a couple other small things. I did find a crash though. To reproduce, you can just use the grow/shrink brush on the "Random" curves from the add menu.

source/blender/blenkernel/BKE_customdata.h

						
				@ -170,0 +161,4 @@

				 * Initializes a CustomData object with the same layers as source. The data is not copied from the

				 * source. Instead, the new layers are initialized using the given `alloctype`.

				 */

				void CustomData_copy_new(const struct CustomData *source,

Hans Goudey commented

2023-03-14 23:49:22 +01:00

What about CustomData_copy_construct to make the naming consistent with typical RAII patterns? It would tell the reader that the dest customdata struct doesn't have to be initialized a bit better.

What about `CustomData_copy_construct` to make the naming consistent with typical RAII patterns? It would tell the reader that the `dest` customdata struct doesn't have to be initialized a bit better.

source/blender/blenkernel/intern/customdata.cc

						
				@ -2234,1 +2259,3 @@

				    if ((maxnumber != -1) && (number >= maxnumber)) {

				    if ((max_current_type_layer_count != -1) &&

				        (current_type_layer_count >= max_current_type_layer_count)) {

				      /* Don't merge this layer because the maximum amount of layers of this type is reached. */

Hans Goudey commented

2023-03-15 00:08:29 +01:00

Wow, great comments here :D

source/blender/blenkernel/intern/customdata.cc

						
				@ -2315,0 +2349,4 @@

				 * A #bCopyOnWrite that knows how to free the entire referenced custom data layer (including

				 * potentially separately allocated chunks like for vertex groups).

				 */

				class CustomDataLayerCOW : public bCopyOnWrite {

Hans Goudey commented

2023-03-14 23:40:48 +01:00

I think it wasn't obvious why this was necessary when you explained it to me at first. A sentence explaining that like "because layer data might be passed outside of the custom data system" would be helpful.

source/blender/blenkernel/intern/mesh_convert.cc

						
				@ -638,3 +638,3 @@

				  CustomData_free(&pointcloud->pdata, pointcloud->totpoint);

				  pointcloud->totpoint = me->totvert;

				  CustomData_merge(&me->vdata, &pointcloud->pdata, CD_MASK_PROP_ALL, CD_DUPLICATE, me->totvert);

Hans Goudey commented

2023-03-15 00:07:02 +01:00

No need to add a blank line here IMO

source/blender/blenlib/BLI_copy_on_write.hh

						
				@ -0,0 +14,4 @@

				/**

				 * #bCopyOnWrite allows implementing copy-on-write behavior, i.e. it allows sharing read-only data

				 * between multiple independend systems (e.g. meshes). The data is only copied when it is shared

Hans Goudey commented

2023-03-15 00:04:12 +01:00

Typo: independend -> independent

Typo: `independend` -> `independent`

source/blender/blenlib/BLI_copy_on_write.hh

						
				@ -0,0 +22,4 @@

				 * data only has a single owner and is mutable. If it is larger than 1, it is shared and must be

				 * logically const.

				 *

				 * On top of containing a reference count, #bCopyOnWrite also knows how to destruct the referenced

Hans Goudey commented

2023-03-14 23:59:21 +01:00

On top of containing a -> In addition to containing the

Just a bit clearer (there's only one reference count per item) and reads a bit more naturally

`On top of containing a` -> `In addition to containing the` Just a bit clearer (there's only one reference count per item) and reads a bit more naturally

source/blender/blenlib/BLI_copy_on_write.hh

						
				@ -0,0 +32,4 @@

				 *   arrays (e.g. for mesh attributes).

				 * - It can be embedded into another struct. For that it's best to use #bCopyOnWriteMixin.

				 */

				struct bCopyOnWrite : blender::NonCopyable, blender::NonMovable {

Hans Goudey commented

2023-03-14 23:44:05 +01:00

This is a C++ header now, and only used in C++ code, so I think it would make sense to:

Use the blender:: namespace like other blenlib classes
Remove the b prefix which becomes redundant with the namespace.

This is a C++ header now, and only used in C++ code, so I think it would make sense to: - Use the `blender::` namespace like other blenlib classes - Remove the `b` prefix which becomes redundant with the namespace.

source/blender/blenlib/BLI_copy_on_write_user.hh

						
				@ -0,0 +10,4 @@

				namespace blender {

				template<typename T> class COWUser {

Hans Goudey commented

2023-03-15 00:06:08 +01:00

I guess COWUser should keep the old comment above UserCounter and maybe say "this adds RAII semantics on top of the copy-on-write logic from #bCopyOnWrite."

I guess `COWUser` should keep the old comment above `UserCounter` and maybe say "this adds RAII semantics on top of the copy-on-write logic from #bCopyOnWrite."

source/blender/blenloader/intern/writefile.cc

						
				@ -1689,0 +1694,4 @@

				    return;

				  }

				  MemFile &memfile = *writer->wd->mem.written_memfile;

				  if (memfile.cow_storage == nullptr) {

Hans Goudey commented

2023-03-14 23:54:02 +01:00

Maybe it doesn't matter, but maybe it would be reasonable to just create this once at the start of the writing process? It's likely to be used, and it would save the check for every write call. OTOH I'm sure this isn't a bottleneck!

Sergey Sharybin requested changes 2023-03-15 12:17:41 +01:00

Sergey Sharybin left a comment

The goal sounds great!

There are some topics, without really particular order.

Performance

In the goal you states memory usage and performance improvements. But in the conclusion part where you show the improvements you are only focusing on the timing information.

How about memory usage?
It is quite weird to not have such metric measured in the context of copy-on-write (which is a resource-sharing technique, with a goal to save memory by avoiding creation of unneeded copies).

In the context topic raised in the #95845 one would expect actually measurable memory improvement. From the dependency graph point of view, the memory consumption is about 2x for meshes which do not have any modifiers. This is because the dependency graph "de-couples" the evaluated mesh from the original, so that the render thread can access it and modify, while artist can modify the original mesh in viewport, without causing any conflicts. Sharing the heavy CustomData arrays between the evaluated state and the original until they gets modified should give very good memory save.

Measuring the difference could be a bit more challenging, and might not be 2x when comparing against 2.79, but is still should be measurable improvement.

The biggest saving is expected to have on a scene with a lot of meshes with modifiers disabled. This is how the layout and some animation files are set up for the performance reasons. For the simplicity of memory tracking you can fake such setup by creating a mesh with few tens of million of vertices (do it as a base mesh, do not have modifiers).

Naming

The copy-on-write is a technique, which takes care of efficient copy/sharing of a modifiable resource. I do not think having it stated in types and variables really showing intent in the best possible way.

To me the more correct name for bCopyOnWriteMixin seems could be bShareableMixing, bImplicitlyShareableMixin, bShareableResourceMixin. It more directly showing the intent of it.

I know in a lot of places there is already cow all over the place. But this is already confusing, and even violating the initial design where the proper naming is to use orig_ and eval_.

API

I did not have time to check things in details yet, and was mainly looking into the BKE_customdata.h, as it is something that many developers interface with, and it was already hard to follow.

Maybe some of the points belong to an inlined comment, but some of them affect multiple functions, so seems to make sense to come to a general consensus.

The new suffix is something that confuses me. Not sure how it is expected to behave if CD_ASSIGN is passes as an alloctype, and maybe naming could be better.

If the CD_ASSIGN is purely internal-only use, maybe there are ways to make it more explicit like so. Maybe it can be moved away from header to an implementation file? Maybe renamed to CD_ASSIGN_INTERNAL?

For the new suffix maybe something like construct_alike? To kinda show it explicitly in the name that it is only "topology" of layers is considered, and not the content.

Also not really sure the difference between CustomData_merge_new and CustomData_copy_new. Some comment/information seems to be missing. Is it than the latter considers the dest to be completely un-initialized?

Another point is the const struct bCopyOnWrite *cow. Either the naming is to be changed, or the definition of the structure.

In a more ideal world, you'd imagine that you pass buffer as a single argument, so that you are not risking "stripping" user-counter away by accident. In the terms of C this will be something like

struct bShareableHeader {
  int64_t user_counter;
};

struct CustomDataLayerData {
  bShareableHeader shareable_header;

  void *data_ptr;
};

and somehow pass CustomDataLayerData as a single argument to the _with_data functions.

File IO and Undo

It would indeed eventually be nice to improve memory usage of undo, doing such intrusive changed (as per the PR description) seems a bit odd to do so early on, and will only make the review even longer and more complicated.

What are the consequences of removing BLO_memfile_write_file on the auto-save?

This is something that is not trivial to guess, and there is potential to cause huge usability problems.

On a quick investigation, it seems that before this change the memfile will always be used, even when in the sculpt mode (the use_memfile depends on the user setting, which does not change in the sculpt mode, and there is no warning printed in the console either).

The benefit of this approach is that the auto-save is fast, and does not block the UI, disturbing artists. The downside is that the sculpt changes done in the current sculpt session are not saved in the autosave.

After this change, the ED_editors_flush_edits is forced to happen. This is not cheap operation when you sculpt on a high-res object. Locking UI for a few seconds at regular time intervals is not something artists will appreciate.

Is there a way to split this change into smaller achivable and verifyable steps? For example, is it possible to get the memory optimization of the initial Mesh copy for the depsgraph's CoW without opening the story of the undo? Maybe even smaller pieces are possible (you mentioned making the CustomData const-correct, i.e.)?

Currently the scope seems to be too big, and there are already big topics which seems to be neglected (at least as per the patch description).

The goal sounds great! There are some topics, without really particular order. **Performance** In the goal you states memory usage and performance improvements. But in the conclusion part where you show the improvements you are only focusing on the timing information. How about memory usage? It is quite weird to not have such metric measured in the context of copy-on-write (which is a resource-sharing technique, with a goal to save memory by avoiding creation of unneeded copies). In the context topic raised in the #95845 one would expect actually measurable memory improvement. From the dependency graph point of view, the memory consumption is about 2x for meshes which do not have any modifiers. This is because the dependency graph "de-couples" the evaluated mesh from the original, so that the render thread can access it and modify, while artist can modify the original mesh in viewport, without causing any conflicts. Sharing the heavy CustomData arrays between the evaluated state and the original until they gets modified should give very good memory save. Measuring the difference could be a bit more challenging, and might not be 2x when comparing against 2.79, but is still should be measurable improvement. The biggest saving is expected to have on a scene with a lot of meshes with modifiers disabled. This is how the layout and some animation files are set up for the performance reasons. For the simplicity of memory tracking you can fake such setup by creating a mesh with few tens of million of vertices (do it as a base mesh, do not have modifiers). **Naming** The copy-on-write is a technique, which takes care of efficient copy/sharing of a modifiable resource. I do not think having it stated in types and variables really showing intent in the best possible way. To me the more correct name for `bCopyOnWriteMixin` seems could be `bShareableMixing`, `bImplicitlyShareableMixin`, `bShareableResourceMixin`. It more directly showing the intent of it. I know in a lot of places there is already `cow` all over the place. But this is already confusing, and even violating the initial design where the proper naming is to use `orig_` and `eval_`. **API** I did not have time to check things in details yet, and was mainly looking into the `BKE_customdata.h`, as it is something that many developers interface with, and it was already hard to follow. Maybe some of the points belong to an inlined comment, but some of them affect multiple functions, so seems to make sense to come to a general consensus. The `new` suffix is something that confuses me. Not sure how it is expected to behave if `CD_ASSIGN` is passes as an `alloctype`, and maybe naming could be better. If the `CD_ASSIGN` is purely internal-only use, maybe there are ways to make it more explicit like so. Maybe it can be moved away from header to an implementation file? Maybe renamed to `CD_ASSIGN_INTERNAL`? For the `new` suffix maybe something like `construct_alike`? To kinda show it explicitly in the name that it is only "topology" of layers is considered, and not the content. Also not really sure the difference between `CustomData_merge_new` and `CustomData_copy_new`. Some comment/information seems to be missing. Is it than the latter considers the `dest` to be completely un-initialized? Another point is the `const struct bCopyOnWrite *cow`. Either the naming is to be changed, or the definition of the structure. In a more ideal world, you'd imagine that you pass buffer as a single argument, so that you are not risking "stripping" user-counter away by accident. In the terms of C this will be something like ``` struct bShareableHeader { int64_t user_counter; }; struct CustomDataLayerData { bShareableHeader shareable_header; void *data_ptr; }; ``` and somehow pass `CustomDataLayerData` as a single argument to the `_with_data` functions. **File IO and Undo** It would indeed eventually be nice to improve memory usage of undo, doing such intrusive changed (as per the PR description) seems a bit odd to do so early on, and will only make the review even longer and more complicated. What are the consequences of removing `BLO_memfile_write_file` on the auto-save? This is something that is not trivial to guess, and there is potential to cause huge usability problems. On a quick investigation, it seems that before this change the memfile will always be used, even when in the sculpt mode (the `use_memfile` depends on the user setting, which does not change in the sculpt mode, and there is no warning printed in the console either). The benefit of this approach is that the auto-save is fast, and does not block the UI, disturbing artists. The downside is that the sculpt changes done in the current sculpt session are not saved in the autosave. After this change, the `ED_editors_flush_edits` is forced to happen. This is not cheap operation when you sculpt on a high-res object. Locking UI for a few seconds at regular time intervals is not something artists will appreciate. ---- Is there a way to split this change into smaller achivable and verifyable steps? For example, is it possible to get the memory optimization of the initial Mesh copy for the depsgraph's CoW without opening the story of the undo? Maybe even smaller pieces are possible (you mentioned making the CustomData const-correct, i.e.)? Currently the scope seems to be too big, and there are already big topics which seems to be neglected (at least as per the patch description).

Jacques Lucke commented

2023-03-15 12:58:23 +01:00

Thanks for the feedback!

You're totally right that this should be split up a bit more. The patch started out as an experiment to see where copy-on-write can be used, so I added everything at once and just didn't think of splitting it up later on.

I suggest the following steps:

Blenlib changes + changes specific to geometry nodes (use the new type in GeometryComponent and AnonymousAttributeID).
CustomData integration.
Undo integration.

Does that sound reasonable to you? If yes, then I'll split things up this way. Will respond to your more specific comments in the respective pull requests then.

To me the more correct name for bCopyOnWriteMixin seems could be bShareableMixing, bImplicitlyShareableMixin, bShareableResourceMixin. It more directly showing the intent of it.

Renaming bCopyOnWrite to ShareableResource is fine with me (same for the Mixin type). Maybe something like SharedResourceInfo or so could work as well, the name makes it a bit more clear that the type is not necessarily the resource itself, but tracks a potentially separately allocated resource (like an attribute array).

Thanks for the feedback! You're totally right that this should be split up a bit more. The patch started out as an experiment to see where copy-on-write can be used, so I added everything at once and just didn't think of splitting it up later on. I suggest the following steps: 1. Blenlib changes + changes specific to geometry nodes (use the new type in `GeometryComponent` and `AnonymousAttributeID`). 2. `CustomData` integration. 3. Undo integration. Does that sound reasonable to you? If yes, then I'll split things up this way. Will respond to your more specific comments in the respective pull requests then. ----- > To me the more correct name for `bCopyOnWriteMixin` seems could be `bShareableMixing`, `bImplicitlyShareableMixin`, `bShareableResourceMixin`. It more directly showing the intent of it. Renaming `bCopyOnWrite` to `ShareableResource` is fine with me (same for the `Mixin` type). Maybe something like `SharedResourceInfo` or so could work as well, the name makes it a bit more clear that the type is not necessarily the resource itself, but tracks a potentially separately allocated resource (like an attribute array).

Jacques Lucke commented

2023-03-15 13:05:13 +01:00

For the simplicity of memory tracking you can fake such setup by creating a mesh with few tens of million of vertices (do it as a base mesh, do not have modifiers).

As a side note, I just tested this with a heavily subdivided cube. In main the memory usage is 2.5 GB and in the branch 1.13 GB as shown below (also approximately matches what htop shows me).

> For the simplicity of memory tracking you can fake such setup by creating a mesh with few tens of million of vertices (do it as a base mesh, do not have modifiers). As a side note, I just tested this with a heavily subdivided cube. In `main` the memory usage is 2.5 GB and in the branch 1.13 GB as shown below (also approximately matches what `htop` shows me). ![image](/attachments/3365b87c-b686-4b01-ab49-4d3749599ce6)

image.png

132 KiB

❤️ 1 🚀 4

Brecht Van Lommel requested changes 2023-03-15 19:00:27 +01:00

Brecht Van Lommel left a comment

Splitting this up would indeed help reviewing, I haven't looked closely since it's a bit difficult to get the big picture and I agree with the points Sergey makes.

The quit.blend saving changes could also be reviewed separately from the other undo changes.

For naming, I think this should be called either "copy on write" or "implicit sharing" since that's the standard terminology. Just shared / shareable seems not distinguished enough for shared_ptr.

Splitting this up would indeed help reviewing, I haven't looked closely since it's a bit difficult to get the big picture and I agree with the points Sergey makes. The quit.blend saving changes could also be reviewed separately from the other undo changes. For naming, I think this should be called either "copy on write" or "implicit sharing" since that's the standard terminology. Just shared / shareable seems not distinguished enough for `shared_ptr`.

source/blender/blenlib/BLI_copy_on_write_user.hh

						
				@ -0,0 +10,4 @@

				namespace blender {

				template<typename T> class COWUser {

Brecht Van Lommel commented

2023-03-15 18:55:36 +01:00

I would not abbreviate to COW but use CopyOnWrite, or whichever name we end up choosing.

Sergey Sharybin commented

2023-03-17 10:41:53 +01:00

@JacquesLucke The plan sounds good.

The memory saving is fantastic!

@JacquesLucke @HooglyBoogly Any objections of sticking to implicit sharing?

To me it seems to be the most clear from both showing intent, disambiguating it from the name of a technique, and solves the possible confusion with shared_ptr.

P.S. The latter one I personally don't really have issues with. To me the bCopyOnWrite (how it is currently called and intended to be used anyway) is the same as shared_ptr when it comes to the life management and assignment. It "just" adds some extra call to "un-share" the resource. But it is important to keep such fundamental data types and APIs as clear as possible and intuitively understood by everyone.

@JacquesLucke The plan sounds good. The memory saving is fantastic! @JacquesLucke @HooglyBoogly Any objections of sticking to `implicit sharing`? To me it seems to be the most clear from both showing intent, disambiguating it from the name of a technique, and solves the possible confusion with `shared_ptr`. P.S. The latter one I personally don't really have issues with. To me the `bCopyOnWrite` (how it is currently called and intended to be used anyway) is the same as `shared_ptr` when it comes to the life management and assignment. It "just" adds some extra call to "un-share" the resource. But it is important to keep such fundamental data types and APIs as clear as possible and intuitively understood by everyone.

Jacques Lucke commented

2023-03-17 12:06:28 +01:00

I didn't use the term implicit sharing before and didn't come across it much, but can get used to it. Based on Wikipedia the term indeed seems to mean what we want, so I'm fine with that.

I didn't use the term `implicit sharing` before and didn't come across it much, but can get used to it. Based on [Wikipedia](https://en.wikipedia.org/wiki/Copy-on-write) the term indeed seems to mean what we want, so I'm fine with that.

Jacques Lucke changed title from ~~Core: use generic copy-on-write system to avoid redundant copies~~ to WIP: Core: use generic copy-on-write system to avoid redundant copies

2023-03-20 09:54:44 +01:00

Hans Goudey referenced this pull request

2023-03-20 20:44:36 +01:00

Mesh: Replace MPoly struct with offset indices #105938

Jacques Lucke referenced this pull request

2023-03-22 14:23:12 +01:00

Core: new blenlib library for implicit-sharing #105994

Brecht Van Lommel requested changes 2023-03-24 13:04:40 +01:00

source/blender/blenlib/BLI_copy_on_write.hh

						
				@ -0,0 +63,4 @@

				  void remove_user_and_delete_if_last() const

				  {

				    const int old_user_count = users_.fetch_sub(1, std::memory_order_relaxed);

Brecht Van Lommel commented

2023-03-24 13:04:01 +01:00

My understanding is that user count decrement requires std::memory_order_acq_rel, while increment can use std::memory_order_relaxed.

For example libc++ does this for shared_ptr. More details:
https://stackoverflow.com/questions/48124031/stdmemory-order-relaxed-atomicity-with-respect-to-the-same-atomic-variable

My understanding is that user count decrement requires `std::memory_order_acq_rel`, while increment can use `std::memory_order_relaxed`. For example libc++ does this for `shared_ptr`. More details: https://stackoverflow.com/questions/48124031/stdmemory-order-relaxed-atomicity-with-respect-to-the-same-atomic-variable

brecht marked this conversation as resolved

Brecht Van Lommel reviewed 2023-03-24 13:06:26 +01:00

source/blender/blenlib/BLI_copy_on_write.hh

						
				@ -0,0 +63,4 @@

				  void remove_user_and_delete_if_last() const

				  {

				    const int old_user_count = users_.fetch_sub(1, std::memory_order_relaxed);