Refactor how we handle versionning during readfile, to allow creation of new IDs in a reliable way. #111932

Open
opened 2023-09-04 17:41:05 +02:00 by Bastien Montagne · 11 comments

This design task aims at enabling creation of new IDs at any stage of the read file process, in a safe and sane way. In particular, the place-holders for missing linked data, and the IDs created as part of versioning.

Problem

Several recent issues and incidents have shine some light over several weaknesses of our current do_versions code, regarding adding IDs:

  • It is forbidden in after_liblink, with some exceptions for certain ID types (!).
  • It is allowed by code, but not considered good practice, in regular do_version.
  • It is recommended to use BLO_read_do_version_after_setup when adding new IDs, however this adds some serious draw-backs too, and its usage should remain as exceptional as possible.
  • In any case, 'new' IDs are added at readtime too, before any do_version code is ran: the empty 'place holders' generated when some directly linked data reference cannot be found anymore.
  • The main issue with creating new IDs before/during versioning is that these new IDs, even though they have been created with current BKE code, will still go through all the versioning code required by the version of the loaded blendfile.
  • Adding new IDs before the lib-linking process also means that they need to be taken into account during the liblinking process itself, as their addresses are not known by default by the readfile code. Although to my knowledge there were never a reported bug about it, this is a very nasty 'potential' bug in our current code.

Proposal

The general idea is that IDs actually read by readfile process should be tagged for do_version, such that versioning code can only process them, and skip the others (which are assumed created according to current data version) entirely.

While tagging read IDs is fairly trivial, the problem is to avoid processing them in current do_version code.

  • Create a new LIB_TAG_DO_VERSIONING ID tag.
  • Tag all IDs read from blendfile with this new tag (except for the placeholders generated for missing linked data).
    • This tag needs to be cleared out at the end of the readfile code, which means that BLO_read_do_version_after_setup code will not be aware of this. This is not expected to be an issue in practice, as code there is supposed to work on high-level data info (like 'is this object a proxy'), not on version-based info.
  • Refactor the do_version code to let the generic readfile code decide whether a given ID needs to be versioned or not.
    • This is the complex part, see below for details.

do_version Refactor

The proposal is to replace the usages of LISTBASE_FOREACH over ID types listbases in versioning code, by a dedicated iterator. This iterator code would ensure the generic checks (placeholders, newly created IDs, etc.).

This should also cover slightly more specialized iterators, like e.g. FOREACH_NODETREE, or the generic FOREACH_MAIN_ID.

This change can be implemented in two steps, the first one by proxying current iterators with new defined names. This would be a very noisy commit, but it would be guaranteed to have no effect at all on the behavior of the code.

Proposed new names are to re-use existing ones, prefixed with DO_VERSION_.

The second commit will then be the one implementing the new 'filtered' behavior, together with the other aspects of this design.

Other Ideas

Move versioning code to IDTypes.

Initial idea was to have some sort of IDType structure defined in versioning code, which would gather all versioning for each type. Could even have been added to the actual IDTypeInfo maybe?

But this is likely a fairly complicated change to implement, in case e.g. there are interactions between IDs.

Further more, defining a code structure that would work nicely with the do_version requirements is challenging (each ID can be processed many times, for each version increment, and it's very likely not a good idea to switch to a model where each ID would be processed in one go over all the required versions).

So for now it feels like a potentially huge time consuming task, which does not seems to be worth it.

Add newly created IDs to a temporary separate Main.

While this would avoid the need for new tag for these IDs, and the change to the versioning code itself to filter them out, this has several drawbacks that are likely harder to address:

  • It requires a new Main with special meanings, and special handling, in the whole readfile code. Probably even one extra main for each library too.
  • It requires specific handling of naming for the added IDs, to avoid name collision with IDs from the 'real' read Mains.

Notes:

  • Somewhat related to #92333.
  • The current 'multi-stage' versioning process causes another type of problems, which is that a later versioning code before lib-linking can make an earlier versioning code in after_liblink invalid/broken. The same is (even more) true when it comes to code in BLO_read_do_version_after_setup, however this is a known and expected issue, since that one is fairly version-agnostic.
    There is no clear solution to this problem currently, since it does not seem to be possible to process versioning at a single point in readfile code.
This design task aims at enabling creation of new IDs at any stage of the read file process, in a safe and sane way. In particular, the place-holders for missing linked data, and the IDs created as part of versioning. ## Problem Several recent issues and incidents have shine some light over several weaknesses of our current do_versions code, regarding adding IDs: * It is forbidden in `after_liblink`, with some exceptions for certain ID types (!). * It is allowed by code, but not considered good practice, in regular `do_version`. * It is recommended to use `BLO_read_do_version_after_setup` when adding new IDs, however this adds some serious draw-backs too, and its usage should remain as exceptional as possible. * In any case, 'new' IDs are added at readtime too, before any do_version code is ran: the empty 'place holders' generated when some directly linked data reference cannot be found anymore. * The main issue with creating new IDs before/during versioning is that these new IDs, even though they have been created with current BKE code, will still go through all the versioning code required by the version of the loaded blendfile. * Adding new IDs before the lib-linking process also means that they need to be taken into account during the liblinking process itself, as their addresses are not known by default by the readfile code. Although to my knowledge there were never a reported bug about it, this is a very nasty 'potential' bug in our current code. ## Proposal The general idea is that IDs actually read by readfile process should be tagged for do_version, such that versioning code can only process them, and skip the others (which are assumed created according to current data version) entirely. While tagging read IDs is fairly trivial, the problem is to avoid processing them in current do_version code. * [ ] Create a new `LIB_TAG_DO_VERSIONING` ID tag. * [ ] Tag all IDs read from blendfile with this new tag (except for the placeholders generated for missing linked data). * This tag needs to be cleared out at the end of the readfile code, which means that `BLO_read_do_version_after_setup` code will not be aware of this. This is not expected to be an issue in practice, as code there is supposed to work on high-level data info (like 'is this object a proxy'), not on version-based info. * [ ] Refactor the do_version code to let the generic readfile code decide whether a given ID needs to be versioned or not. * This is the complex part, see below for details. ### `do_version` Refactor The proposal is to replace the usages of `LISTBASE_FOREACH` over ID types listbases in versioning code, by a dedicated iterator. This iterator code would ensure the generic checks (placeholders, newly created IDs, etc.). This should also cover slightly more specialized iterators, like e.g. `FOREACH_NODETREE`, or the generic `FOREACH_MAIN_ID`. This change can be implemented in two steps, the first one by proxying current iterators with new defined names. This would be a very noisy commit, but it would be guaranteed to have no effect at all on the behavior of the code. Proposed new names are to re-use existing ones, prefixed with `DO_VERSION_`. The second commit will then be the one implementing the new 'filtered' behavior, together with the other aspects of this design. #### Other Ideas ##### Move versioning code to IDTypes. Initial idea was to have some sort of IDType structure defined in versioning code, which would gather all versioning for each type. Could even have been added to the actual IDTypeInfo maybe? But this is likely a fairly complicated change to implement, in case e.g. there are interactions between IDs. Further more, defining a code structure that would work nicely with the do_version requirements is challenging (each ID can be processed many times, for each version increment, and it's very likely not a good idea to switch to a model where each ID would be processed in one go over all the required versions). So for now it feels like a potentially huge time consuming task, which does not seems to be worth it. ##### Add newly created IDs to a temporary separate Main. While this would avoid the need for new tag for these IDs, and the change to the versioning code itself to filter them out, this has several drawbacks that are likely harder to address: * It requires a new Main with special meanings, and special handling, in the whole readfile code. Probably even one extra main for each library too. * It requires specific handling of naming for the added IDs, to avoid name collision with IDs from the 'real' read Mains. ## Notes: * Somewhat related to #92333. * The current 'multi-stage' versioning process causes another type of problems, which is that a later versioning code before lib-linking can make an earlier versioning code in `after_liblink` invalid/broken. The same is (even more) true when it comes to code in `BLO_read_do_version_after_setup`, however this is a known and expected issue, since that one is fairly version-agnostic. _There is no clear solution to this problem currently, since it does not seem to be possible to process versioning at a single point in readfile code._
Bastien Montagne added the
Type
Design
Module
Core
Interest
BlendFile
labels 2023-09-04 17:41:21 +02:00
Bastien Montagne changed title from WIP: Refactor how we handle versionning during readfile, to allow creation of new IDs in a reliable way. to Refactor how we handle versionning during readfile, to allow creation of new IDs in a reliable way. 2023-09-12 12:01:29 +02:00
Author
Owner

@brecht @Sergey @JacquesLucke @ideasman42 Think you may be interested in this topic?

FYI this was triggered by recent issues in nodes versioning code @LukasTonne had to fight after adding the node panels. But the problem has been lurking for years.

@brecht @Sergey @JacquesLucke @ideasman42 Think you may be interested in this topic? FYI this was triggered by recent issues in nodes versioning code @LukasTonne had to fight after adding the node panels. But the problem has been lurking for years.

I am not sure why do we need a wrapper function to create IDs from the versioning code. You can flip the meaning of the tag to the LIB_TAG_DO_VERSIONING and tag all IDs in the bmain prior to calling the versioning functionality. The newely created IDs will not have this tag set, so they will be naturally ignored, without adding extra wrappers and public flags.

For the dedicated iterators for the IDs during versioning this sounds useful, but what I am not sure is if we can somehow more automatically prevent people from using LISTBASE_FOREACH for the IDs.

I am not sure why do we need a wrapper function to create IDs from the versioning code. You can flip the meaning of the tag to the `LIB_TAG_DO_VERSIONING` and tag all IDs in the `bmain` prior to calling the versioning functionality. The newely created IDs will not have this tag set, so they will be naturally ignored, without adding extra wrappers and public flags. For the dedicated iterators for the IDs during versioning this sounds useful, but what I am not sure is if we can somehow more automatically prevent people from using `LISTBASE_FOREACH` for the IDs.
Author
Owner

I am not sure why do we need a wrapper function to create IDs from the versioning code. You can flip the meaning of the tag to the LIB_TAG_DO_VERSIONING and tag all IDs in the bmain prior to calling the versioning functionality. The newely created IDs will not have this tag set, so they will be naturally ignored, without adding extra wrappers and public flags.

Indeed, that would be more elegant and efficient! Will update the proposal.

For the dedicated iterators for the IDs during versioning this sounds useful, but what I am not sure is if we can somehow more automatically prevent people from using LISTBASE_FOREACH for the IDs.

We could #undef these macros in versioning code... A bit brutal approach, but 'could just work'©

> I am not sure why do we need a wrapper function to create IDs from the versioning code. You can flip the meaning of the tag to the `LIB_TAG_DO_VERSIONING` and tag all IDs in the `bmain` prior to calling the versioning functionality. The newely created IDs will not have this tag set, so they will be naturally ignored, without adding extra wrappers and public flags. Indeed, that would be more elegant and efficient! Will update the proposal. > For the dedicated iterators for the IDs during versioning this sounds useful, but what I am not sure is if we can somehow more automatically prevent people from using `LISTBASE_FOREACH` for the IDs. We could `#undef` these macros in versioning code... A bit brutal approach, but 'could just work'©

We could #undef these macros in versioning code... A bit brutal approach, but 'could just work'©

Thing is: the LISTBASE_FOREACH is not only used for traversing IDs, but also things like bones, modifiers, sequences ...

> We could `#undef` these macros in versioning code... A bit brutal approach, but 'could just work'© Thing is: the `LISTBASE_FOREACH` is not only used for traversing IDs, but also things like bones, modifiers, sequences ...
Member

Two possible solutions I can think of:

  • Redefine LISTBASE_FOREACH with a static assert that checks that it is not used for IDs.
  • Remove direct access to bmain. See below.
class VersionMain {
 private:
  Main *bmain;

 public:
  Main *get_internal_bmain() const { return this->bmain; }
};

void blo_do_versions_400(FileData *fd, Library * /*lib*/, VersionMain *vmain) {
  if (MAIN_VERSION_LESS_THAN(vmain, 400, 1)) {
    FOREACH_VERSION_ID_BEGIN(Mesh *, mesh, vmain, meshes) {
      /* Do versioning for mesh. */
    }
    FOREACH_VERSION_ID_END;
  }
}
Two possible solutions I can think of: * Redefine `LISTBASE_FOREACH` with a static assert that checks that it is not used for IDs. * Remove direct access to `bmain`. See below. ```cpp class VersionMain { private: Main *bmain; public: Main *get_internal_bmain() const { return this->bmain; } }; void blo_do_versions_400(FileData *fd, Library * /*lib*/, VersionMain *vmain) { if (MAIN_VERSION_LESS_THAN(vmain, 400, 1)) { FOREACH_VERSION_ID_BEGIN(Mesh *, mesh, vmain, meshes) { /* Do versioning for mesh. */ } FOREACH_VERSION_ID_END; } } ```
Author
Owner

Second solution sounds nice to me, it avoids having to chase down all possible ways to directly iterate over the Main lists of IDs in that massive amount of versioning code.

Second solution sounds nice to me, it avoids having to chase down all possible ways to directly iterate over the Main lists of IDs in that massive amount of versioning code.
Member

The change to avoid versioning "new" data-blocks created during versioning makes sense. If possible though, what about avoiding adding another global tag and keep the change local to versioning?

Something like this:

class VersionMain {
 private:
  Main *bmain;
  Set<ID *> new_ids; // Skip these in the "for each version ID loops"
...

This would avoid the need to set and clear tags, it would keep the change clearly local to versioning, avoid bit flag manipulation, and avoid the proliferation of flags that require modifying memory just to set a default state.

--

Also, I do have a comment about making the design task a bit easier to understand. Currently the "problems" section uses language like "It is forbidden," "It is allowed," and "It is recommended." After reading the rest of the task, I think I'm still missing something about these points. It might be clearer the design skipped that sort of statement and went directly to say why something is forbidden or allowed.

The change to avoid versioning "new" data-blocks created during versioning makes sense. If possible though, what about avoiding adding another global tag and keep the change local to versioning? Something like this: ```cpp class VersionMain { private: Main *bmain; Set<ID *> new_ids; // Skip these in the "for each version ID loops" ... ``` This would avoid the need to set and clear tags, it would keep the change clearly local to versioning, avoid bit flag manipulation, and avoid the proliferation of flags that require modifying memory just to set a default state. -- Also, I do have a comment about making the design task a bit easier to understand. Currently the "problems" section uses language like "It is forbidden," "It is allowed," and "It is recommended." After reading the rest of the task, I think I'm still missing something about these points. It might be clearer the design skipped that sort of statement and went directly to say _why_ something is forbidden or allowed.
Author
Owner

This would add a fair amount of complexity and processing cost to versioning - and not cover other potential 'added IDs' cases, like the placeholders generated by readfile code.

This would add a fair amount of complexity and processing cost to versioning - and not cover other potential 'added IDs' cases, like the placeholders generated by readfile code.
Author
Owner

Further more, the more we can stay consistent regarding statuses of IDs, the better. Currently, it is always done through tags or flags.

Not to mention that versioning code may encounter such IDs (especially in the 'after liblink' processing), and therefore needs to be able to recognize them too.

Further more, the more we can stay consistent regarding statuses of IDs, the better. Currently, it is always done through tags or flags. Not to mention that versioning code may encounter such IDs (especially in the 'after liblink' processing), and therefore needs to be able to recognize them too.
Member

not cover other potential 'added IDs' cases, like the placeholders generated by readfile code.

Ah, good point, it would reintroduce the need for a wrapper for creating IDs during versioning.

Currently, it is always done through tags or flags.

I do think tags like LIB_TAG_DOIT aren't great from a code quality, performance, const correctness, and thread safety standpoint, but that's a different topic!

>not cover other potential 'added IDs' cases, like the placeholders generated by readfile code. Ah, good point, it would reintroduce the need for a wrapper for creating IDs during versioning. >Currently, it is always done through tags or flags. I do think tags like `LIB_TAG_DOIT` aren't great from a code quality, performance, const correctness, and thread safety standpoint, but that's a different topic!

With this mechanism, one thing we have to be careful about is that it does not solve the problem of changing default values in datablocks. An earlier Blender version might have had different defaults for that datablock, and skipping versioning entirely does not solve that problem.

Additionally if there is after lib link versioning with a dependency on other datablocks, skipping the versioning code may also not give the correct result.

The only way to really solve that I think would be to initialize datablocks the same as if they had been created in the earlier Blender version with a dedicated allocation function. I'm not sure that's required or ideal in all cases, but in some cases it might be simpler than using LIB_TAG_DO_VERSIONING.

For something like creating collection datablocks from scene layers I would at least consider it.

With this mechanism, one thing we have to be careful about is that it does not solve the problem of changing default values in datablocks. An earlier Blender version might have had different defaults for that datablock, and skipping versioning entirely does not solve that problem. Additionally if there is after lib link versioning with a dependency on other datablocks, skipping the versioning code may also not give the correct result. The only way to really solve that I think would be to initialize datablocks the same as if they had been created in the earlier Blender version with a dedicated allocation function. I'm not sure that's required or ideal in all cases, but in some cases it might be simpler than using `LIB_TAG_DO_VERSIONING`. For something like creating collection datablocks from scene layers I would at least consider it.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
5 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#111932
No description provided.