ID Deduplication on Append - Technical Implementation #90545

Closed
opened 2021-08-09 15:22:59 +02:00 by Bastien Montagne · 9 comments

Problem

  • How to detect that a given ID has been already appended, and re-use this local version of it instead of 'linking and making local' it again.
  • How to define which types of relationships between IDs should force linking/full appending/ID-re-using behaviors.
  • How to control/detect when a previously appended, local ID should not be re-used anymore for future appends.

Make Appended IDs re-usable

Appended IDs should keep a 'weak reference' to their source library and original name.

While we could re-use existing ID.lib pointer, it would be way easier to add a new data to ID struct, storing their original (library) name, and their source library. For at least three reasons:

  • We need to add that extra 'original name' field anyway.
  • While ideally no code should rely anymore on ID.lib == NULL type of checks to handle linked vs. local IDs (and use instead the macros defined in DNA_ID.h), in practice this is not ensured yet. While cleaning this up would be nice, it can also trigger annoying issues and bugs.
  • And it's probably better from an architectural point of view to clearly separate the 'linked' vs. 'local with weak reference to linked data' cases.

NOTE: This is slightly different from the liboverride case, where the local override does keep an explicit strong reference to the linked ID.

Proposal is therefore to store this info in a new dedicated pointer in the ID struct.

Default ID Behavior On Append

Idea is to mix two different kinds of information when deciding which IDs to link, make local or re-use:

  • By ID usage: For some specific ID usages (e.g. if an image is used by a material, versus a modifier).
  • By ID types: Generic default behavior to apply to all ID types.

'ID usage' would have priority over 'ID type', when defined.

NOTE: This will require a rewrite of the append code, which is likely a good thing anyway, as it is currently fairly confusing and already brittle/broken in some cases, see also #55629 (Append already linked Data). It was written at a time where we did not have access to the fine ID relationships info, like we do now with the lib_query code. DONE in 3be5ce4aad and related commits.

These flags/behaviors should also be tweakable by the calling code.

By ID Usage
NOTE: This is probably not required in the initial MVP stage, and will likely not be implemented at first.

Add new flags to the lib_query 'foreach_id' process, that will allow to define specific behaviors for specific ID relationships. Appending code can then use this information to decide when it should just keep the linked ID, make it fully local (creating a new copy of it), or try to first re-use an existing matching local ID.

The benefit of having it defined there is that it allows for a fine tuning of behaviors for each usages. However, ID usages from ID properties (including python-defined 'RNA data') would have a single generic behavior, unless we extend ID properties to support this as well).

Proposed flags/behaviors:

  • IDWALK_CB_APPEND_KEEP_LINKED - Always link the ID - Maybe for cases like image IDs used by materials?
  • IDWALK_CB_APPEND_MAKE_LOCAL - Always make a new local copy of the linked ID
  • IDWALK_CB_APPEND_REUSE_LOCAL - Attempt to re-use an existing local ID first, and make a new local copy if none is found.

NOTE: not sure how forced linking would be actually useful in real life.

By ID Type
Define which ID types should always be appended or ID-re-used by default. Defined as a new flag in the IDTypeInfo.

When an ID ceases to be re-usable

While detecting a modified ID (using e.g. RNA diffing) would be nice, this is probably not easily doable in practice (e.g. detecting modified meshes would require specific binary diffing to be efficient, same for images... ).

Suggestion for now is to simply delete the 'weak reference' data from an ID to make it non-re-usable. This could be done by the user itslef (e.g. from the outliner and the ID template), and by some operations (like entering Edit or Sculpt modes for obdata? Painting for Images? etc.).

When a user decide to forcefully make a new duplicate on an append (instead of re-using existing local ID), the existing local ID, if any, should lose any reference to its source library data (such that there is always only one local ID holding reference to any given library data).

# Problem * How to detect that a given ID has been already appended, and re-use this local version of it instead of 'linking and making local' it again. * How to define which types of relationships between IDs should force linking/full appending/ID-re-using behaviors. * How to control/detect when a previously appended, local ID should not be re-used anymore for future appends. ## Make Appended IDs re-usable Appended IDs should keep a 'weak reference' to their source library **and original name**. While we could re-use existing `ID.lib` pointer, it would be way easier to add a new data to `ID` struct, storing their original (library) name, and their source library. For at least three reasons: * We need to add that extra 'original name' field anyway. * While ideally no code should rely anymore on `ID.lib == NULL` type of checks to handle linked vs. local IDs (and use instead the macros defined in `DNA_ID.h`), in practice this is not ensured yet. While cleaning this up would be nice, it can also trigger annoying issues and bugs. * And it's probably better from an architectural point of view to clearly separate the 'linked' vs. 'local with weak reference to linked data' cases. NOTE: This is slightly different from the liboverride case, where the local override does keep an explicit strong reference to the linked ID. Proposal is therefore to store this info in a new dedicated pointer in the ID struct. ## Default ID Behavior On Append Idea is to mix two different kinds of information when deciding which IDs to link, make local or re-use: * By ID usage: For some specific ID usages (e.g. if an image is used by a material, versus a modifier). * By ID types: Generic default behavior to apply to all ID types. 'ID usage' would have priority over 'ID type', when defined. NOTE: This will require a rewrite of the append code, which is likely a good thing anyway, as it is currently fairly confusing and already brittle/broken in some cases, see also #55629 (Append already linked Data). It was written at a time where we did not have access to the fine ID relationships info, like we do now with the `lib_query` code. *DONE in 3be5ce4aad and related commits.* These flags/behaviors should also be tweakable by the calling code. **By ID Usage** NOTE: This is probably not required in the initial MVP stage, and will likely not be implemented at first. Add new flags to the `lib_query` 'foreach_id' process, that will allow to define specific behaviors for specific ID relationships. Appending code can then use this information to decide when it should just keep the linked ID, make it fully local (creating a new copy of it), or try to first re-use an existing matching local ID. The benefit of having it defined there is that it allows for a fine tuning of behaviors for each usages. However, ID usages from ID properties (including python-defined 'RNA data') would have a single generic behavior, unless we extend ID properties to support this as well). Proposed flags/behaviors: * `IDWALK_CB_APPEND_KEEP_LINKED` - Always link the ID - *Maybe for cases like image IDs used by materials?* * `IDWALK_CB_APPEND_MAKE_LOCAL` - Always make a new local copy of the linked ID * `IDWALK_CB_APPEND_REUSE_LOCAL` - Attempt to re-use an existing local ID first, and make a new local copy if none is found. NOTE: not sure how forced linking would be actually useful in real life. **By ID Type** Define which ID types should always be appended or ID-re-used by default. Defined as a new flag in the `IDTypeInfo`. ## When an ID ceases to be re-usable While detecting a modified ID (using e.g. RNA diffing) would be nice, this is probably not easily doable in practice (e.g. detecting modified meshes would require specific binary diffing to be efficient, same for images... ). Suggestion for now is to simply delete the 'weak reference' data from an ID to make it non-re-usable. *This could be done by the user itslef (e.g. from the outliner and the ID template), and by some operations (like entering Edit or Sculpt modes for obdata? Painting for Images? etc.).* When a user decide to forcefully make a new duplicate on an append (instead of re-using existing local ID), the existing local ID, if any, should lose any reference to its source library data (such that there is always only one local ID holding reference to any given library data).
Author
Owner

Changed status from 'Needs Triage' to: 'Confirmed'

Changed status from 'Needs Triage' to: 'Confirmed'
Author
Owner

Added subscriber: @mont29

Added subscriber: @mont29
Bastien Montagne changed title from ID Deduplication on Append to ID Deduplication on Append - Technical Implementation 2021-08-09 15:24:37 +02:00

Added subscriber: @dr.sybren

Added subscriber: @dr.sybren

While ideally no code should rely anymore on ID.lib == NULL type of checks to handle linked vs. local IDs (and use instead the macros defined in DNA_ID.h), in practice this is not ensured yet. While cleaning this up would be nice, it can also trigger annoying issues and bugs.

If/when this changes, it should also be reflected in #blender_asset_tracer . Just wanted to mention it so that we can handle that preemptively, instead of waiting for the Blender Animation Studio render farm dying at the peak of production stress.

This is slightly different from the liboverride case, where the local override does keep an explicit strong reference to the linked ID.

What is the difference? What does it mean for a reference to be explicit and strong?

This will require a rewrite of the append code. [...] These flags/behaviors should also be tweakable by the calling code.

In that case, it might be a good idea to move the code design from passing flags (and testing for them all over the place) to a more suitable design pattern (like a strategy pattern, for example). That should disentangle quite a few use cases, and make it easier to follow what's going on in which situation.

IDWALK_CB_APPEND_LINK - Always link the ID - Maybe for cases like image IDs used by materials?
IDWALK_CB_APPEND_COPY - Always make a new local copy of the linked ID

Does the "append" in the name dictate that the targeted datablock is to be appended, and the subsequent verb "link" or "copy" indicate what needs to be done with dependencies? If that's not the case, then I'm confused what "the ID" means, and why there is "append" in the name of the thing that doesn't append.

> While ideally no code should rely anymore on `ID.lib == NULL` type of checks to handle linked vs. local IDs (and use instead the macros defined in `DNA_ID.h`), in practice this is not ensured yet. While cleaning this up would be nice, it can also trigger annoying issues and bugs. If/when this changes, it should also be reflected in #blender_asset_tracer . Just wanted to mention it so that we can handle that preemptively, instead of waiting for the Blender Animation Studio render farm dying at the peak of production stress. > This is slightly different from the liboverride case, where the local override does keep an explicit strong reference to the linked ID. What is the difference? What does it mean for a reference to be explicit and strong? > This will require a rewrite of the append code. [...] These flags/behaviors should also be tweakable by the calling code. In that case, it might be a good idea to move the code design from passing flags (and testing for them all over the place) to a more suitable design pattern (like a strategy pattern, for example). That should disentangle quite a few use cases, and make it easier to follow what's going on in which situation. > `IDWALK_CB_APPEND_LINK` - Always link the ID - Maybe for cases like image IDs used by materials? > `IDWALK_CB_APPEND_COPY` - Always make a new local copy of the linked ID Does the "append" in the name dictate that the targeted datablock is to be appended, and the subsequent verb "link" or "copy" indicate what needs to be done with dependencies? If that's not the case, then I'm confused what "the ID" means, and why there is "append" in the name of the thing that doesn't append.

This issue was referenced by 794c2828af

This issue was referenced by 794c2828af60af02a38381c2a9a81f9046381074

Added subscriber: @brecht

Added subscriber: @brecht

It would be good to have a plan for how to deal with the case where an ID ceases to be reusable, I think leaving it to the user is not that great. Maybe clearing the weak reference on user edits would work, though we don't have a clear concept of such user edits that right now.

In some ways I think deduplication based on an asset UUID makes more sense, but since we can't dynamically add a UUID in the .blend file we are appending from, and not everything that is appended is an asset that we could guarantee to have a UUID, the .blend file path + datablock name is the next best thing.

It would be good to have a plan for how to deal with the case where an ID ceases to be reusable, I think leaving it to the user is not that great. Maybe clearing the weak reference on user edits would work, though we don't have a clear concept of such user edits that right now. In some ways I think deduplication based on an asset UUID makes more sense, but since we can't dynamically add a UUID in the .blend file we are appending from, and not everything that is appended is an asset that we could guarantee to have a UUID, the .blend file path + datablock name is the next best thing.
Author
Owner

Changed status from 'Confirmed' to: 'Archived'

Changed status from 'Confirmed' to: 'Archived'
Author
Owner

Closing, techincal side has been implemented... Will keep the UI/UX open for now.

Closing, techincal side has been implemented... Will keep the UI/UX open for now.
Thomas Dinges added this to the 3.0 milestone 2023-02-08 15:59:05 +01:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#90545
No description provided.