Fix versioning old 2.6 node groups, causing dangling link pointers #111704

Merged
Lukas Tönne merged 11 commits from LukasTonne/blender:fix-versioning-26-node-groups into main 2023-09-05 12:37:13 +02:00
Member

In 2.6 the old method of using bNodeSocket lists in bNodeTree directly
as group sockets was replaced with new group input/output nodes. This
required versioning to create those input/output nodes and then redirect
links to the new node sockets. Because creating nodes relies heavily on
node typeinfo this versioning was done in the _after_linking section
of the 2.6 versioning code, running after all other versioning
(including for much newer versions!) has already happended.

While typinfo is available at that point, doing such late versioning
causes severe problems when the data structure changes, as is the case
with the recent node panels patch (#111348). The new node group
interface also has versioning code for 4.0, but this runs before the
_after_linking code for 2.6! Versioning for node panels expects
sockets in bNodeTree to not have any links pointing at them, but this is
not true for old 2.6 files which have not yet been fully versioned at
that point, because of the late versioning stage. Subsequently 2.6
_after_linking code crashes when trying to modify node links with
dangling pointers.

The solution here is to move the old versioning code out of the
after_linking stage to restore the expected versioning chain. This
requires creating nodes and node sockets without any typeinfo, but
luckily we only need to create simple known group input/output nodes
which don't have much complicated behavior.

In 2.6 the old method of using bNodeSocket lists in bNodeTree directly as group sockets was replaced with new group input/output nodes. This required versioning to create those input/output nodes and then redirect links to the new node sockets. Because creating nodes relies heavily on node typeinfo this versioning was done in the `_after_linking` section of the 2.6 versioning code, running after _all other versioning_ (including for much newer versions!) has already happended. While typinfo is available at that point, doing such late versioning causes severe problems when the data structure changes, as is the case with the recent node panels patch (#111348). The new node group interface also has versioning code for 4.0, but this runs _before_ the `_after_linking` code for 2.6! Versioning for node panels expects sockets in bNodeTree to not have any links pointing at them, but this is not true for old 2.6 files which have not yet been fully versioned at that point, because of the late versioning stage. Subsequently 2.6 `_after_linking` code crashes when trying to modify node links with dangling pointers. The solution here is to move the old versioning code out of the `after_linking` stage to restore the expected versioning chain. This requires creating nodes and node sockets without any typeinfo, but luckily we only need to create simple known group input/output nodes which don't have much complicated behavior.
Lukas Tönne added 1 commit 2023-08-30 17:37:06 +02:00
a0cb03b5d3 Fix for versioning old 2.6 node groups, causing dangling link pointers.
In 2.6 the old method of using bNodeSocket lists in bNodeTree directly
as group sockets was replaced with new group input/output nodes. This
required versioning to create those input/output nodes and then redirect
links to the new node sockets. Because creating nodes relies heavily on
node typeinfo this versioning was done in the `_after_linking` section
of the 2.6 versioning code, running after _all other versioning_
(including for much newer versions!) has already happended.

While typinfo is available at that point, doing such late versioning
causes severe problems when the data structure changes, as is the case
with the recent node panels patch (#111348). The new node group
interface also has versioning code for 4.0, but this runs _before_ the
`_after_linking` code for 2.6! Versioning for node panels expects
sockets in bNodeTree to not have any links pointing at them, but this is
not true for old 2.6 files which have not yet been fully versioned at
that point, because of the late versioning stage. Subsequently 2.6
`_after_linking` code crashes when trying to modify node links with
dangling pointers.

The solution here is to move the old versioning code out of the
`after_linking` stage to restore the expected versioning chain. This
requires creating nodes and node sockets without any typeinfo, but
luckily we only need to create simple known group input/output nodes
which don't have much complicated behavior.
Lukas Tönne added this to the Nodes & Physics project 2023-08-30 17:37:55 +02:00
Lukas Tönne requested review from Bastien Montagne 2023-08-30 17:38:08 +02:00
Lukas Tönne requested review from Jacques Lucke 2023-08-30 17:38:08 +02:00
Lukas Tönne added 1 commit 2023-08-30 17:38:58 +02:00
Jacques Lucke requested changes 2023-08-31 12:11:46 +02:00
Jacques Lucke left a comment
Member

Generally looks good. I think it would be good if we would construct these nodes directly as dna structs in versioning code more often instead of calling code from blenkernel. That always works fine for now, but is brittle in the long term..
Added some inline comments.

Generally looks good. I think it would be good if we would construct these nodes directly as dna structs in versioning code more often instead of calling code from blenkernel. That always works fine for now, but is brittle in the long term.. Added some inline comments.
@ -2173,0 +2266,4 @@
/* Convert the previously used ntree->inputs/ntree->outputs lists to interface nodes.
* Pre 2.56.2 node trees automatically have all unlinked sockets exposed already,
* see do_versions_after_linking_250.
Member

This comment hints at a potential problem, because now do_versions_after_linking_250 also runs after this versioning code now. I don't even know what node systems we had pre 2.5 tbh.

This comment hints at a potential problem, because now `do_versions_after_linking_250` also runs after this versioning code now. I don't even know what node systems we had pre 2.5 tbh.
Author
Member

I've started adding a simple compatibility test, loading files from 2.5, 2.6, 3.6, etc (trying to cover major transitions in node code). Loading a simple 2.5 file already fails an assert in unrelated versioning code, because a node gets added to the runtime nodes_by_id set twice.
cf12d11de6/source/blender/blenloader/intern/versioning_250.cc (L1899)

I hope there aren't too many other issues and i can actually test the after_linking parts. This should have been done years ago, but better late than never ...

I've started adding a simple compatibility test, loading files from 2.5, 2.6, 3.6, etc (trying to cover major transitions in node code). Loading a simple 2.5 file already fails an assert in unrelated versioning code, because a node gets added to the runtime `nodes_by_id` set twice. https://projects.blender.org/blender/blender/src/commit/cf12d11de6b3dfd9df70f5621659d0ca4b7d1047/source/blender/blenloader/intern/versioning_250.cc#L1899 I hope there aren't too many other issues and i can actually test the `after_linking` parts. This should have been done years ago, but better late than never ...
Author
Member

The offending nodeUniqueName was added by ab4926bcff but it re-generates the identifier for every socket in a node and adds the node to nodes_by_id as many times.

The offending `nodeUniqueName` was added by ab4926bcffee but it re-generates the identifier for every socket in a node and adds the node to `nodes_by_id` as many times.
Author
Member

I find so many broken things with 2.5 nodes versioning, might be preferable to complete this PR for 2.6 compatibility and continue fixing 2.5 nodes separately.

I find so many broken things with 2.5 nodes versioning, might be preferable to complete this PR for 2.6 compatibility and continue fixing 2.5 nodes separately.
Author
Member

Getting back to the original question: "Does this after_linking code still work after regular versioning to 4.0?"

Yes! And in this case running in after_linking is necessary, because the socket identifier update looks at the node tree pointers of group nodes (i.e. other ID data blocks). It reads from bNodeSocket::own_index (deprecated by 2.5.2) and bNodeSocket::identifier (stable, existing socket identifiers never change). So this should be pretty safe to keep in after_linking.

Getting back to the original question: "Does this `after_linking` code still work after regular versioning to 4.0?" Yes! And in this case running in `after_linking` is necessary, because the socket identifier update looks at the node tree pointers of group nodes (i.e. other ID data blocks). It reads from `bNodeSocket::own_index` (deprecated by 2.5.2) and `bNodeSocket::identifier` (stable, existing socket identifiers never change). So this should be pretty safe to keep in `after_linking`.
Member

The question was mostly whether it's a problem that now the order of the versioning is reversed, since the after-linking code runs after the code that is moved in this patch (but it used to run before).

The question was mostly whether it's a problem that now the order of the versioning is reversed, since the after-linking code runs after the code that is moved in this patch (but it used to run before).
Author
Member

In #111800 i'm adding a compatibility regression test for node groups. The 3.6 case already required some minor fixes, i want to add a 2.5 test if feasible.

Took me a long time to unravel the timeline of node changes, here's what i reconstructed so far:

TL;DR: We need to update after_linking code to find group socket matches and preserve external links.

  • Before 2.56.2 all the unconnected internal sockets were exposed. To find the internal socket for a group node input or output one would search for the internal socket whose own_index matches the external to_index. groupnode25.jpg

  • Starting with 2.56.2 and before 2.66.3 this behavior changed and now node groups would only show an explicit set of inputs and outputs (stored in bNodeTree inputs/outputs). This required versioning to

    1. Create a tree socket for every unconnected internal socket to preserve existing node setups.
    2. Add special links with tonode==nullptr or fromnode==nullptr which indicate a connection to the node tree interface.
      code link

    All of this happens inside the same bNodeTree as part of regular versioning. The after_linking versioning is needed for updating the own_index of group nodes (code link):
    own_index is still used to identify sockets at this point (replaced with identifiers later). to_index gets deprecated, the socket in the node tree and its counterpart on a group node instance should simply have the same own_index. In order to version this, however, each group node needs to inspect its linked node group, which is not available during regular versioning.

    Note: Socket identifier strings are already generated by 2.56.2 versioning, but not sure if they would have been used for anything. This might be some apocryphal versioning code added later?

  • In 2.66.2+ there was another set of changes:

    1. own_index mapping is replaced with unique socket identifier strings code link
    2. The "special" interface links with node nullptrs are replaced by group input/output nodes and regular links (no more node nullptrs allowed) code link

    The 2.66.3 versioning code was also running in the after_linking stage, but as demonstrated here this is only for the convenience of bNodeSocketType callbacks and not really necessary.

What could happen if we execute the 2.56.2 after_linking code after all the 4.0 versioning? Inside the node group all the unconnected sockets have been exposed first as ntree->inputs/ntree-outputs with nullptr links, those have then been replaced with group input/output nodes and regular links, and then the ntree->inputs/ntree-outputs have been replaced by tree_interface.ui_items. Even after all this the deprecated own_index of the sockets in the tree that should be exposed to mimick 2.5 behavior still exist, which would make it possible to find internal sockets and update the external socket identifiers.

However, the versioning code also expects there to be special links with node nullptrs now - so the 2.56.2 after_linking versioning code already includes changes from 2.66.3 in order to function correctly. The special links have been replaced by versioning code, so this part is now ineffective and never runs.

The consequence is that socket identifiers for node groups are not fully updated and some links get broken. This is because the identifiers of existing sockets are first generated by 2.66.2 versioning code (code link) but they are only based on names and unique suffixes. Sockets without a matching group tree socket are replaced later and any links discarded. groupnode25.jpg

We need to update do_versions_after_linking_250 to take this into account and search for matching internal socket like so:

  • For any socket inside the node tree whose own_index matches the group socket's to_index:
    • Check if it has a link to a group input/output node
    • Copy the matching group interface socket identifier

There is a lingering question of what might happen if some intermediate versioning depends on correct group socket identifiers? But i think this is as far as we can reasonably support such ancient files.

In #111800 i'm adding a compatibility regression test for node groups. The 3.6 case already required some minor fixes, i want to add a 2.5 test if feasible. Took me a long time to unravel the timeline of node changes, here's what i reconstructed so far: **TL;DR: We need to update after_linking code to find group socket matches and preserve external links.** - Before 2.56.2 all the unconnected internal sockets were exposed. To find the internal socket for a group node input or output one would search for the internal socket whose `own_index` matches the external `to_index`. ![groupnode25.jpg](/attachments/945dc5df-d121-4b34-8e83-f0cac4cf45de) - Starting with 2.56.2 and before 2.66.3 this behavior changed and now node groups would only show an explicit set of inputs and outputs (stored in `bNodeTree` `inputs`/`outputs`). This required versioning to 1. Create a tree socket for every unconnected internal socket to preserve existing node setups. 2. Add special links with `tonode==nullptr` or `fromnode==nullptr` which indicate a connection to the node tree interface. [code link](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenloader/intern/versioning_250.cc#L1837-L1913) All of this happens inside the same `bNodeTree` as part of regular versioning. The `after_linking` versioning is needed for updating the `own_index` of group nodes ([code link](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenloader/intern/versioning_250.cc#L2131-L2156)): `own_index` is still used to identify sockets at this point (replaced with identifiers later). `to_index` gets deprecated, the socket in the node tree and its counterpart on a group node instance should simply have the same `own_index`. In order to version this, however, each group node needs to inspect its linked node group, which is not available during regular versioning. _Note: Socket identifier strings are already generated by 2.56.2 versioning, but not sure if they would have been used for anything. This might be some apocryphal versioning code added later?_ - In 2.66.2+ there was another set of changes: 1. `own_index` mapping is replaced with unique socket identifier strings [code link](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenloader/intern/versioning_260.cc#L950-L990) 2. The "special" interface links with node nullptrs are replaced by group input/output nodes and regular links (no more node `nullptrs` allowed) [code link](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenloader/intern/versioning_260.cc#L2697-L2798) The 2.66.3 versioning code was also running in the `after_linking` stage, but as demonstrated here this is only for the convenience of `bNodeSocketType` callbacks and not really necessary. What could happen if we execute the 2.56.2 `after_linking` code after all the 4.0 versioning? Inside the node group all the unconnected sockets have been exposed first as `ntree->inputs`/`ntree-outputs` with nullptr links, those have then been replaced with group input/output nodes and regular links, and then the `ntree->inputs`/`ntree-outputs` have been replaced by `tree_interface.ui_items`. Even after all this the deprecated `own_index` of the sockets in the tree that should be exposed to mimick 2.5 behavior still exist, which would make it possible to find internal sockets and update the external socket identifiers. However, the versioning code also expects there to be special links with node nullptrs now - so the 2.56.2 `after_linking` versioning code already includes changes from 2.66.3 in order to function correctly. The special links have been replaced by versioning code, so this part is now ineffective and never runs. The consequence is that socket identifiers for node groups are not fully updated and some links get broken. This is because the identifiers of existing sockets are first generated by 2.66.2 versioning code ([code link](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenloader/intern/versioning_260.cc#L950-L990)) but they are only based on names and unique suffixes. Sockets without a matching group tree socket are replaced later and any links discarded. ![groupnode25.jpg](/attachments/3007ee7e-2e24-4c28-9a84-e1480c083997) We need to update `do_versions_after_linking_250` to take this into account and search for matching internal socket like so: - For any socket inside the node tree whose `own_index` matches the group socket's `to_index`: - Check if it has a link to a group input/output node - Copy the matching group interface socket identifier There is a lingering question of what might happen if some intermediate versioning depends on correct group socket identifiers? But i think this is as far as we can reasonably support such ancient files.
Member

Thanks for looking into this. I think it's ok when we are not able to load these ancient files perfectly. Would be nice, but it is what it is. If people really rely on those files, they may have to load and save the files in a more recent version. It's good to know why things are not working anyway. Also, it's great that you are adding some tests for this!
Do you happen to know when the node systems that we use nowadays have been introduced?

Thanks for looking into this. I think it's ok when we are not able to load these ancient files perfectly. Would be nice, but it is what it is. If people really rely on those files, they may have to load and save the files in a more recent version. It's good to know why things are not working anyway. Also, it's great that you are adding some tests for this! Do you happen to know when the node systems that we use nowadays have been introduced?
Author
Member

Do you happen to know when the node systems that we use nowadays have been introduced?

Depends on what you mean by "systems that we use nowadays". Oldest commit to the nodes folder is from 2007. From then on the nodes system has been a Ship of Theseus: 2.5 saw the change to explicit node group sockets, then group input/output nodes and python nodes in 2.6, and other parts of the system have been modified and swapped out over the years.

> Do you happen to know when the node systems that we use nowadays have been introduced? Depends on what you mean by "systems that we use nowadays". Oldest commit to the `nodes` folder is from 2007. From then on the nodes system has been a Ship of Theseus: 2.5 saw the change to explicit node group sockets, then group input/output nodes and python nodes in 2.6, and other parts of the system have been modified and swapped out over the years.
Author
Member

I can't figure this out (in reasonable time and with acceptable sanity loss). I "fixed" the 2.5 after_linking code, but it still does not work because socket identifiers in the bNodeTree are changing so much that links cannot be restored from the 2.5 file.

  1. Socket identifiers first initialized by do_versions_nodetree_customnodes in versioning_260 to a FooBar.001 format
  2. Identifiers change somewhere between 2.7 and 4.0 into FooBar_001 format
  3. Then they change back to FooBar.001 format in the after_linking code
  4. Socket links break because the tree interface has identifiers in FooBar_001 format (not sure where from)
I can't figure this out (in reasonable time and with acceptable sanity loss). I "fixed" the 2.5 after_linking code, but it still does not work because socket identifiers in the bNodeTree are changing so much that links cannot be restored from the 2.5 file. 1. Socket identifiers first initialized by `do_versions_nodetree_customnodes` in versioning_260 to a `FooBar.001` format 2. Identifiers change _somewhere between 2.7 and 4.0_ into `FooBar_001` format 3. Then they change back to `FooBar.001` format in the after_linking code 4. Socket links break because the tree interface has identifiers in `FooBar_001` format (not sure where from)
@ -2173,0 +2369,4 @@
FOREACH_NODETREE_END;
}
if (!MAIN_VERSION_FILE_ATLEAST(bmain, 280, 60)) {
Member

This versioning looks like it's at the wrong place (see version number)

This versioning looks like it's at the wrong place (see version number)
Author
Member

Correct. I considered moving it to the 280 file, and the first thing i saw at the bottom of the that file was a bit of versioning code for ... 3.0 :P

https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenloader/intern/versioning_280.cc#L6300-L6324

I find it extremely difficult to reason about why something works or doesn't in nodes versioning, so i just left it next to the other bit for now ...

Correct. I considered moving it to the 280 file, and the first thing i saw at the bottom of the _that_ file was a bit of versioning code for ... 3.0 :P https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenloader/intern/versioning_280.cc#L6300-L6324 I find it extremely difficult to reason about why something works or doesn't in nodes versioning, so i just left it next to the other bit for now ...
Member

I see. It's not super horrible if it works. The code between here and where this versioning code would usually live is unlikely to change in major ways.

I see. It's not super horrible if it works. The code between here and where this versioning code would usually live is unlikely to change in major ways.
Jacques Lucke approved these changes 2023-09-01 19:16:52 +02:00
Lukas Tönne added 2 commits 2023-09-04 09:45:40 +02:00
Lukas Tönne added 5 commits 2023-09-04 16:29:50 +02:00
bf3c7aedc0 Removed old 2.5 after_linking code that is already broken.
2.5 versioning would require an after_linking step to construct correct
socket identifiers of node groups, which would preserve links to and
from the group. This has been broken for a long time and cannot be
easily restored. The code is not running anyway because it depends on
outdated bNodeLink format, so its better to just remove entirely.
57b17483e8 Removed invalid nodeUniqueName calls.
This function is supposed to only be called once, or the tree topology
will fail on assert. node names are made unique later on already.
4f84995bca Use warning instead of error for modifier issues to avoid test crash.
CLOG_ERROR aborts in debug mode, which makes all tests fail (in this
case because a nodes modifier does not have a geometry output, which
should just disable it).
bad7e18689 Initialize socket typeinfo before node typeinfo.
Creating nodes relies on socket typeinfo to be valid before node
typeinfo itself.
buildbot/vexp-code-patch-coordinator Build done. Details
58ea865f09
Added missing 'compact' flag for sockets.
Also fixed use of an incorrect flag for interface sockets.
Author
Member

@blender-bot build

@blender-bot build
Lukas Tönne added 1 commit 2023-09-04 17:23:52 +02:00
Lukas Tönne added 1 commit 2023-09-05 11:48:07 +02:00
buildbot/vexp-code-patch-coordinator Build done. Details
f47109838e
Merge branch 'main' into fix-versioning-26-node-groups
Author
Member

@blender-bot build

@blender-bot build
Lukas Tönne merged commit ed1f9d4fdd into main 2023-09-05 12:37:13 +02:00
Lukas Tönne deleted branch fix-versioning-26-node-groups 2023-09-05 12:37:15 +02:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#111704
No description provided.