Use hash for Collection.gobject lookup, speedup object linking #104553

Closed
Campbell Barton wants to merge 5 commits from ideasman42:pr-collection-object-hash into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.

Add a hash for faster look-ups on collection->gobject,
This avoids a full list lookup for every object added via Python's
CollectionObject.link as well as linking via BKE_collection_object_add_*
functions.

While the speedup is non-linear, linking & unlinking 100k objects from
Python is about 50x faster. Although unlinking all objects in order
(a best-case for linked lists) is approximately the same speed.


Initial tests (see simple_collection_speed_test.py): 100,000 objects, which is significantly faster linking & unlinking.

Before

  • Time to link objects into scene: 100000 -> 5.564 seconds.
  • Time to unlink: 100000 -> 5.597 seconds.

After

  • Time to link objects into scene: 100000 -> 0.115 seconds.
  • Time to unlink: 100000 -> 0.099 seconds.

Notes:

  • This patch was written as a possible alternative to adding a CollectionObjects.link_multiple method, see: D16260, with the intent of resolving the issue in a more general way that benefits object linking everywhere, not just from the Python API.

  • Keeping this hash valid at all times might not be trivial, although after some fixes to the initial patch this is working as it should.

  • If maintaining a valid hash is too error prone, we could support this functionality but explicitly create and destroy the gobject_hash so it doesn't need to be maintained across complex operation such as some object ID's being swapped out (if this proves overly complicated to support).

  • While this speeds up object duplication, there are other bottlenecks which make the overall speedup less significant overall.

Add a hash for faster look-ups on collection->gobject, This avoids a full list lookup for every object added via Python's CollectionObject.link as well as linking via BKE_collection_object_add_* functions. While the speedup is non-linear, linking & unlinking 100k objects from Python is about 50x faster. Although unlinking all objects in order (a best-case for linked lists) is approximately the same speed. --- Initial tests (see `simple_collection_speed_test.py`): 100,000 objects, which is significantly faster linking & unlinking. Before - Time to link objects into scene: 100000 -> 5.564 seconds. - Time to unlink: 100000 -> 5.597 seconds. After - Time to link objects into scene: 100000 -> 0.115 seconds. - Time to unlink: 100000 -> 0.099 seconds. Notes: - This patch was written as a possible alternative to adding a CollectionObjects.link_multiple method, see: D16260, with the intent of resolving the issue in a more general way that benefits object linking everywhere, not just from the Python API. - Keeping this hash valid at all times might not be trivial, although after some fixes to the initial patch this is working as it should. - If maintaining a valid hash is too error prone, we could support this functionality but explicitly create and destroy the gobject_hash so it doesn't need to be maintained across complex operation such as some object ID's being swapped out (if this proves overly complicated to support). - While this speeds up object duplication, there are other bottlenecks which make the overall speedup less significant overall.
Campbell Barton requested review from Sergey Sharybin 2023-02-10 06:40:35 +01:00
Campbell Barton requested review from Bastien Montagne 2023-02-10 06:40:36 +01:00
Brecht Van Lommel added this to the Core project 2023-02-13 09:12:20 +01:00
Sergey Sharybin reviewed 2023-02-13 12:25:45 +01:00
Sergey Sharybin left a comment
Owner

Thanks for addressing the feedback, it is now much nicer!
Still some confusion about the consistency check. See the inlined comment.

Thanks for addressing the feedback, it is now much nicer! Still some confusion about the consistency check. See the inlined comment.
@ -1072,2 +1118,3 @@
CollectionObject *cob = BLI_findptr(&collection->gobject, ob, offsetof(CollectionObject, ob));
collection_gobject_hash_ensure(collection);
BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID(collection);

Why do we need BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID after the collection_gobject_hash_ensure? And why is it needed here but not in the other use of the collection_gobject_hash_ensure some lines below?

Why do we need `BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID` after the `collection_gobject_hash_ensure`? And why is it needed here but not in the other use of the `collection_gobject_hash_ensure` some lines below?

I would actually move this check as part of collection_gobject_hash_ensure itself? Would be much less verbose, and make more sense to me?

I would actually move this check as part of `collection_gobject_hash_ensure` itself? Would be much less verbose, and make more sense to me?
Author
Owner

Makes sense, moved into collection_gobject_hash_ensure.

Makes sense, moved into `collection_gobject_hash_ensure`.
ideasman42 marked this conversation as resolved
Bastien Montagne requested changes 2023-02-13 15:28:00 +01:00
Bastien Montagne left a comment
Owner

Agree with @Sergey, also found some more points to address, as noted below.

Regarding the usage of this new hash in all type of Collection IDs (evaluated, non-main etc.), I don't mind, so if people think it may be useful for non-Main IDs as well am fine with it. It sure does make the code simpler.

Agree with @Sergey, also found some more points to address, as noted below. Regarding the usage of this new hash in all type of Collection IDs (evaluated, non-main etc.), I don't mind, so if people think it may be useful for non-Main IDs as well am fine with it. It sure does make the code simpler.
@ -1074,0 +1119,4 @@
collection_gobject_hash_ensure(collection);
BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID(collection);
CollectionObject *cob = BLI_ghash_lookup(collection->runtime.gobject_hash, ob);
BLI_assert(cob == BLI_findptr(&collection->gobject, ob, offsetof(CollectionObject, ob)));

These asserts remove almost all benefit of using the mapping for debug builds... and are totally redundant if BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID is actually checking things? Same for the others below.

So not sure we should keep them here.

These asserts remove almost all benefit of using the mapping for debug builds... and are totally redundant if `BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID` is actually checking things? Same for the others below. So not sure we should keep them here.
Author
Owner

Agree, at the time of writing I was eager to flag up any issues early, but this adds too much overhead, even for debug builds, removing the BLI_assert.

Agree, at the time of writing I was eager to flag up any issues early, but this adds too much overhead, even for debug builds, removing the `BLI_assert`.
ideasman42 marked this conversation as resolved
@ -1104,0 +1153,4 @@
BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID(collection);
collection_gobject_hash_ensure(collection);
CollectionObject *cob = BLI_ghash_lookup(collection->runtime.gobject_hash, ob);

Couldn't this use instead cob = BLI_ghash_popkey(collection->runtime.gobject_hash, ob, NULL);? Then there is no need for an extra lookup below in the call to collection_gobject_hash_remove_object(collection, ob);

Couldn't this use instead `cob = BLI_ghash_popkey(collection->runtime.gobject_hash, ob, NULL);`? Then there is no need for an extra lookup below in the call to `collection_gobject_hash_remove_object(collection, ob);`
Author
Owner

Yep, good idea.

Yep, good idea.
ideasman42 marked this conversation as resolved
@ -1581,0 +1670,4 @@
}
/**
* Update the collections object hash the old reference to `ob_old` uses `cob->ob` as the key.

This is not a proper English sentence... ;)

This is not a proper English sentence... ;)
ideasman42 marked this conversation as resolved
@ -1581,0 +1685,4 @@
collection_gobject_hash_remove_object(collection, ob_old);
}
/* This may be set to NULL. */
if (ob_new) {

This may leak in case ob_new is NULL but cob is not, depending on calling code...

You could make it clear in doc that this function takes 'ownership' of the given cob if any, and may free it if it cannot use it.

But TBH, since this is addressing such a specific use-case, and with a very specific behavior, I would not even make a separate function, and would just add the code directly into collection_foreach_id. Or at the very least move this function above its caller, and rename it to something like collection_foreach_id_gobject_hash_update. I have a hard time seeing this very specific behavior used anywhere else than in the remapping use-case.

This may leak in case `ob_new` is NULL but `cob` is not, depending on calling code... You could make it clear in doc that this function takes 'ownership' of the given `cob` if any, and may free it if it cannot use it. But TBH, since this is addressing such a specific use-case, and with a very specific behavior, I would not even make a separate function, and would just add the code directly into `collection_foreach_id`. Or at the very least move this function above its caller, and rename it to something like `collection_foreach_id_gobject_hash_update`. I have a hard time seeing this very specific behavior used anywhere else than in the remapping use-case.
Author
Owner

It's awkward either way, Sergey requested to move this into a utility function, while it's nice to split out in some ways - it's rather spesific functionality which is only used in one place, renamed the function and removed the ob_new argument so cob->ob is always used.

It's awkward either way, Sergey requested to move this into a utility function, while it's nice to split out in some ways - it's rather spesific functionality which is only used in one place, renamed the function and removed the `ob_new` argument so `cob->ob` is always used.
ideasman42 marked this conversation as resolved
Author
Owner

Following up on discussion on D17016, where I noted.

The hypothetical worst-case this would avoid is building and throwing away the object hash on frame-change (for e.g.),

Sergey replied:

Why would the frame change trigger the build of the hash followed by throwing it away?

After digging into the issue, here is a bug-report #104798 with a proposed fix !104801.

As far as I can see that issue can be addressed separately from this patch, since unnecessarily copying the collection is an issue irrespective of changes made in this patch (although creating the hash does add some overhead).

Following up on discussion on D17016, where I noted. > > The hypothetical worst-case this would avoid is building and throwing away the object hash on frame-change (for e.g.), Sergey replied: > Why would the frame change trigger the build of the hash followed by throwing it away? After digging into the issue, here is a bug-report #104798 with a proposed fix !104801. As far as I can see that issue can be addressed separately from this patch, since unnecessarily copying the collection is an issue irrespective of changes made in this patch (although creating the hash does add some overhead).
Campbell Barton force-pushed pr-collection-object-hash from 9a1a473ffc to 6432f129a9 2023-02-16 04:40:06 +01:00 Compare
Campbell Barton force-pushed pr-collection-object-hash from 6432f129a9 to ec811ae7b6 2023-02-16 05:06:35 +01:00 Compare
Author
Owner

All issues should be addressed now.

@blender-bot build

All issues should be addressed now. @blender-bot build
Campbell Barton requested review from Bastien Montagne 2023-02-16 06:29:23 +01:00
Bastien Montagne requested changes 2023-02-16 11:04:58 +01:00
Bastien Montagne left a comment
Owner

Found some more possible tweaks and improvements...

Also wondering if it would be worth checking on BKE_collection_has_object_recursive and similar, now that we have a hash for objects in a collection. Not in the scope of this patch though.

Found some more possible tweaks and improvements... Also wondering if it would be worth checking on `BKE_collection_has_object_recursive` and similar, now that we have a hash for objects in a collection. Not in the scope of this patch though.
@ -939,3 +979,3 @@
return false;
}
BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID(collection);

Would say this should also use collection_gobject_hash_ensure? While the overhead of initialization may be significant, this is often called in a collection manipulation context, where other calls (like object adding) will ensure the hash anyway... And then code is simpler, call to BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID can be removed, etc.

Would say this should also use `collection_gobject_hash_ensure`? While the overhead of initialization may be significant, this is often called in a collection manipulation context, where other calls (like object adding) will ensure the hash anyway... And then code is simpler, call to `BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID` can be removed, etc.
ideasman42 marked this conversation as resolved
@ -1062,6 +1105,8 @@ static Collection *collection_parent_editable_find_recursive(const ViewLayer *vi
static bool collection_object_add(
Main *bmain, Collection *collection, Object *ob, int flag, const bool add_us)
{
BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID(collection);

This call can be removed (see comment in collection_gobject_hash_ensure).

This call can be removed (see comment in `collection_gobject_hash_ensure`).
ideasman42 marked this conversation as resolved
@ -1072,2 +1117,3 @@
CollectionObject *cob = BLI_findptr(&collection->gobject, ob, offsetof(CollectionObject, ob));
collection_gobject_hash_ensure(collection);
CollectionObject *cob = BLI_ghash_lookup(collection->runtime.gobject_hash, ob);

You can save a lookup here by using BLI_ghash_ensure_p instead... Since you know you'll always end up with a valid cob for the given ob key in that function.

You can save a lookup here by using `BLI_ghash_ensure_p` instead... Since you know you'll always end up with a valid `cob` for the given `ob` key in that function.
ideasman42 marked this conversation as resolved
@ -1101,3 +1149,3 @@
const bool free_us)
{
CollectionObject *cob = BLI_findptr(&collection->gobject, ob, offsetof(CollectionObject, ob));
BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID(collection);

This call can be removed (see comment in collection_gobject_hash_ensure).

This call can be removed (see comment in `collection_gobject_hash_ensure`).
ideasman42 marked this conversation as resolved
@ -1222,2 +1273,2 @@
CollectionObject *cob = BLI_findptr(
&collection->gobject, ob_old, offsetof(CollectionObject, ob));
BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID(collection);
const bool use_hash_exists = collection->runtime.gobject_hash != NULL;

Not sure why we still have this actually?

Since we agreed to allow this hash for all collections, then we can as well rather call collection_gobject_hash_ensure here, remove the call to BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID, and also use BLI_ghash_ensure_p to only perform one lookup.

Not sure why we still have this actually? Since we agreed to allow this hash for all collections, then we can as well rather call `collection_gobject_hash_ensure` here, remove the call to `BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID`, and also use `BLI_ghash_ensure_p` to only perform one lookup.
Author
Owner

With you on the first two issues, but don't think BLI_ghash_ensure_p helps here as the old pointer used as as a key needs to be removed, the new one needs to be inserted again.

With you on the first two issues, but don't think `BLI_ghash_ensure_p` helps here as the old pointer used as as a key needs to be removed, the new one needs to be inserted again.
@ -1581,0 +1648,4 @@
static void collection_gobject_hash_ensure(Collection *collection)
{
if (collection->runtime.gobject_hash) {
return;

Should also call BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID here.

Should also call `BLI_ASSERT_COLLECION_GOBJECT_HASH_IS_VALID` here.
ideasman42 marked this conversation as resolved
Sergey Sharybin requested changes 2023-02-16 11:12:28 +01:00
@ -153,1 +177,4 @@
BKE_LIB_FOREACHID_PROCESS_IDSUPER(data, cob->ob, IDWALK_CB_USER);
if (collection->runtime.gobject_hash && (cob_ob_old != cob->ob)) {

I would move the checks into the collection_gobject_hash_update_object. At leats the cob_ob_old != cob->ob.

I would move the checks into the `collection_gobject_hash_update_object`. At leats the `cob_ob_old != cob->ob`.
ideasman42 marked this conversation as resolved
@ -519,2 +550,4 @@
}
/* This is being deleted, no need to handle each item.
* NOTE: While it might seem an advantage to use the hash instead of the list-lookup

Something seems to be missing here. The collection_object_remove ensures that the hash exists (the first thing it does is the collection_gobject_hash_ensure(collection)).

Something seems to be missing here. The `collection_object_remove` ensures that the hash exists (the first thing it does is the `collection_gobject_hash_ensure(collection)`).
Author
Owner

That's bad. I've split out the remove function so low level code that already knows the CollectionObject doesn't need to do a lookup or ensure the hash exists.

That's bad. I've split out the remove function so low level code that already knows the `CollectionObject` doesn't need to do a lookup or ensure the hash exists.
ideasman42 marked this conversation as resolved
@ -1581,0 +1681,4 @@
if (ob_old) {
collection_gobject_hash_remove_object(collection, ob_old);
}
/* This may be set to NULL. */

This comment does not reall bring information. The fact that it may be NULL is clear from the NULL-pointer check.

The more interesting for a reader information would be to know why CollectionObject can be pointing to NULL. Intuitively those are malformed and can/should simply be removed. If it is allowed to be NULL, an explanation is needed (together with a comment in the typedef struct CollectionObject, and a lot of checks all over the code base are to be added.

The confusing part is that the cob->ob is checked for NULL in SOME cases, inconsistently.

P.S. Generally, avoid over-commenting, as it actually hurts readability.

This comment does not reall bring information. The fact that it may be NULL is clear from the NULL-pointer check. The more interesting for a reader information would be to know why `CollectionObject` can be pointing to NULL. Intuitively those are malformed and can/should simply be removed. If it is allowed to be NULL, an explanation is needed (together with a comment in the `typedef struct CollectionObject`, and a lot of checks all over the code base are to be added. The confusing part is that the `cob->ob` is checked for NULL in SOME cases, inconsistently. P.S. Generally, avoid over-commenting, as it actually hurts readability.
Author
Owner

Yep, expanded the comment to note this isn't simply a paranoid null check.

  /* The object may be set to NULL if the ID is being cleared from #collection_foreach_id,
   * generally `cob->ob` is not expected to be NULL. */
Yep, expanded the comment to note this isn't simply a paranoid null check. ``` /* The object may be set to NULL if the ID is being cleared from #collection_foreach_id, * generally `cob->ob` is not expected to be NULL. */ ```
ideasman42 marked this conversation as resolved
Campbell Barton requested review from Bastien Montagne 2023-02-16 13:21:01 +01:00
Campbell Barton requested review from Sergey Sharybin 2023-02-16 13:21:12 +01:00
Author
Owner

@blender-bot build linux

@blender-bot build linux
Campbell Barton force-pushed pr-collection-object-hash from 36300def5e to bc06586a34 2023-02-19 05:47:26 +01:00 Compare
Sergey Sharybin requested changes 2023-02-20 16:13:13 +01:00
@ -1102,0 +1145,4 @@
* A version of #collection_object_remove that does not handle `collection->runtime.gobject_hash`,
* Either the caller must have removed the object from the hash or the hash may be NULL.
*
* \note in rare cases `cob->ob != ob`, so both arguments need to be passed in.

This is something very surprising to me. I am not sure why this is the case, but it definitely asks to be written down in the comment here.

From quick look around the collection_object_remove_no_gobject_hash is either called with ob=cob->ob, and in other cases is called with a result of ghash lookup. The latter one is done after collection_gobject_hash_ensure which seems to assert that cob->ob is the same as ob.

In general, always document such cases for the future reader. It saves a lot of time (for both current review, and future you and future developers looking into the code).

This is something very surprising to me. I am not sure why this is the case, but it definitely asks to be written down in the comment here. From quick look around the `collection_object_remove_no_gobject_hash` is either called with ob=cob->ob, and in other cases is called with a result of ghash lookup. The latter one is done after `collection_gobject_hash_ensure` which seems to assert that `cob->ob` is the same as `ob`. In general, always document such cases for the future reader. It saves a lot of time (for both current review, and future you and future developers looking into the code).
Author
Owner

This must have been a use-after-free mistake, double checked with the ob argument removed and the tests now pass.

This must have been a use-after-free mistake, double checked with the `ob` argument removed and the tests now pass.
ideasman42 marked this conversation as resolved
Campbell Barton added 1 commit 2023-02-21 11:44:51 +01:00
buildbot/vexp-code-patch-coordinator Build done. Details
028b6e06a2
Experimental removal of object argument
Author
Owner

@blender-bot build

@blender-bot build
Campbell Barton requested review from Sergey Sharybin 2023-02-21 12:12:30 +01:00
Sergey Sharybin approved these changes 2023-02-28 10:56:00 +01:00
Bastien Montagne approved these changes 2023-02-28 12:44:43 +01:00
Author
Owner

Updated description, this patch no longer causes unlinking in-order to be 18% slower, it's now about the same speed.

Updated description, this patch no longer causes unlinking in-order to be 18% slower, it's now about the same speed.
Author
Owner

Committed ea97bb1641.

Committed ea97bb1641b9fc3424c0000c7c7db9a038ae6148.
Campbell Barton closed this pull request 2023-03-01 02:38:00 +01:00
Bastien Montagne removed this from the Core project 2023-07-03 12:45:57 +02:00
All checks were successful
buildbot/vexp-code-patch-coordinator Build done.

Pull request closed

Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#104553
No description provided.