Simulation Nodes: Store baked geometries in .blend file #106790

Open
opened 2023-04-11 10:43:31 +02:00 by Jacques Lucke · 5 comments
Member

For simulation nodes in geometry nodes we want to be able to store the baked geometry data directly in .blend files. This is similar to how we are able to store a baked particle simulation using the old particle system.

What makes the problem more complex now than it used to be is that geometry nodes deals with geometry much more generically. Not only can a mesh change its topology over time, but it's also possible that there are multiple geometries (e.g. a mesh and a point cloud) or the geometry type changes between frames. Also, instances have to be supported, but it may be ok to add that support later.

The main difficulty is to store many arbitrary geometries in the .blend file efficiently. I identified a couple possibilities to deal with this which are explained below.

Option A: Store geometries directly in Main

In this option every baked mesh/pointcloud/curves geometry is stored as a separate data block in Main.

Pros:

  • Code infrastructure exists already.
  • In theory, the baked geometries can easily be edited individually without much futher effort. This is something we want to support eventually anyway.

Cons:

  • Results in potentially lots of additional IDs in Main which can cause significant slowdowns (for example one mesh per baked frame).
  • Probably needs some new behavior in the depsgraph because we likely don't want a depsgraph node per cached geometry data-block.
  • Harder to implement data sharing across multiple data blocks in .blend files than it is within a single ID (both should be possible in theory though).
  • Might need special cases in ui code because we probably don't want to show all these data blocks in the data block dropdowns.
  • The way we store baked geometries wouldn't be an implementation detail, which makes it harder to change later.
  • Instances can't be stored natively in DNA yet.

Option B: Store geometries as no-main data-blocks.

Instead of storing e.g. a baked Mesh in Main (which comes with user counting, unique names, etc.), the mesh could be owned by something else like a node or modifier. From the perspective of the .blend file, the mesh would just be normal data and not an ID data-block. To make that work, there need to be functions like BKE_mesh_blend_write_nomain. I did a small test for this and so far it worked as expected.

Pros:

  • Can reuse existing code to serialize/deserialize geometries (except for instances and volumes).
  • No additional IDs in Main.
  • Implementing data sharing between multiple geometries is more straight forward than with separate data-blocks.

Cons:

  • Need to make sure that versioning code for geometries also runs on those that are not in Main. This likely means iterating over all objects/nodes to version meshes.
  • No-main IDs are often more difficult to deal with by generic id handling code.
  • All these non-main IDs seem to be some new kind of embedded IDs which might need special handling in various places.

Option C: Store geometries as no-main no-id data.

The idea is to introduce a new struct like BakedMesh that is very similar to Mesh but skips a few data members that are not required. Same thing for other geometry types.

Pros:

  • No new complexity in ID management code.
  • Instances behave more like other geometries (one would have to add a BakedInstances struct).
  • More control over which data is (not) stored for baked data.
  • Independent versioning code for baked data.

Cons:

  • Leads to some more code duplication because a new way to serialize geometries has to be added.
  • Can reuse less existing code.

Option D: Use external file format and embedd that in the .blend file.

We also want to be able to store baked geometries outside of the .blend file. For that it's probably best to come up with a new file format that exactly suits our needs (also see #105251). This could potentially be implemented first. Then to store the baked geometries in the .blend file, we could just dump the external files into the .blend file (similar to how image files can be packed).

Pros:

  • Would solve storing baked geometries in the .blend file and outside in one go.
  • Does not add complexity to existing ID management, versioning and .blend IO code.
  • Feature parity between storing baked geometries externally or in the .blend file by default.

Cons:

  • Likely needs more new code than all the other options.

I'm still undecided on which approach would work best in the short and long term. Feedback would be welcome.

For simulation nodes in geometry nodes we want to be able to store the baked geometry data directly in .blend files. This is similar to how we are able to store a baked particle simulation using the old particle system. What makes the problem more complex now than it used to be is that geometry nodes deals with geometry much more generically. Not only can a mesh change its topology over time, but it's also possible that there are multiple geometries (e.g. a mesh and a point cloud) or the geometry type changes between frames. Also, instances have to be supported, but it may be ok to add that support later. The main difficulty is to store many arbitrary geometries in the .blend file efficiently. I identified a couple possibilities to deal with this which are explained below. ## Option A: Store geometries directly in `Main` In this option every baked mesh/pointcloud/curves geometry is stored as a separate data block in `Main`. Pros: * Code infrastructure exists already. * In theory, the baked geometries can easily be edited individually without much futher effort. This is something we want to support eventually anyway. Cons: * Results in potentially lots of additional IDs in `Main` which can cause significant slowdowns (for example one mesh per baked frame). * Probably needs some new behavior in the depsgraph because we likely don't want a depsgraph node per cached geometry data-block. * Harder to implement data sharing across multiple data blocks in .blend files than it is within a single ID (both should be possible in theory though). * Might need special cases in ui code because we probably don't want to show all these data blocks in the data block dropdowns. * The way we store baked geometries wouldn't be an implementation detail, which makes it harder to change later. * Instances can't be stored natively in DNA yet. ## Option B: Store geometries as no-main data-blocks. Instead of storing e.g. a baked `Mesh` in `Main` (which comes with user counting, unique names, etc.), the mesh could be owned by something else like a node or modifier. From the perspective of the .blend file, the mesh would just be normal data and not an ID data-block. To make that work, there need to be functions like `BKE_mesh_blend_write_nomain`. I did a small test for this and so far it worked as expected. Pros: * Can reuse existing code to serialize/deserialize geometries (except for instances and volumes). * No additional IDs in `Main`. * Implementing data sharing between multiple geometries is more straight forward than with separate data-blocks. Cons: * Need to make sure that versioning code for geometries also runs on those that are not in `Main`. This likely means iterating over all objects/nodes to version meshes. * No-main IDs are often more difficult to deal with by generic id handling code. * All these non-main IDs seem to be some new kind of embedded IDs which might need special handling in various places. ## Option C: Store geometries as no-main no-id data. The idea is to introduce a new struct like `BakedMesh` that is very similar to `Mesh` but skips a few data members that are not required. Same thing for other geometry types. Pros: * No new complexity in ID management code. * Instances behave more like other geometries (one would have to add a `BakedInstances` struct). * More control over which data is (not) stored for baked data. * Independent versioning code for baked data. Cons: * Leads to some more code duplication because a new way to serialize geometries has to be added. * Can reuse less existing code. ## Option D: Use external file format and embedd that in the .blend file. We also want to be able to store baked geometries outside of the .blend file. For that it's probably best to come up with a new file format that exactly suits our needs (also see #105251). This could potentially be implemented first. Then to store the baked geometries in the .blend file, we could just dump the external files into the .blend file (similar to how image files can be packed). Pros: * Would solve storing baked geometries in the .blend file and outside in one go. * Does not add complexity to existing ID management, versioning and .blend IO code. * Feature parity between storing baked geometries externally or in the .blend file by default. Cons: * Likely needs more new code than all the other options. ----- I'm still undecided on which approach would work best in the short and long term. Feedback would be welcome.
Jacques Lucke added the
Type
Design
label 2023-04-11 10:43:31 +02:00
Jacques Lucke added this to the Nodes & Physics project 2023-04-11 10:43:32 +02:00

I believe the option D (external) is a good start point since this is required for production files.

I believe the option D (external) is a good start point since this is required for production files.

I would also go with option D (own file format, that can be embedded).

Think that the goals of a cache format are different enough from a 'full feature' geometry format, so would not consider being able to re-use the whole Mesh/PointCloud/... serializing and managing code as a strong target anyway. (Though if some code could be factorized at a lower-level, maybe e.g. at attributes level, that would be good of course). And as said by Dalai, since you need an external cache format anyway, can as well start with that.

Another reason to prefer this approach is that I don't think we want to have all caches permanently loaded in RAM? I do not see how that could work with heavy production scenes? In which case I'd expect at some point to have a 'smart loading/unloading' system. That would be way more easy to implement as its own independent thing, rather than on top of ID management code.

If needs be (e.g. for better performances and/or fast random access), it could be a good opportunity to revise how we store these packed files in .blend btw? (random ideas: have a dedicated section of the .blend file for these, instead of writing them as part of the owning ID data. Add an index to .blend file format to speedup finding these kind of data (and others) from a random access PoV.)

Adding (now or later) optimizations like cross-frame deduplications and such would also be way easier with a dedicated format rather than working with existing Geometry IDs.

Oh, and one last point that is also important: it's probably way better to have caches in their own format for compatibility management. .blend file format tries very hard to keep compatibility between versions in both directions. We ensure (almost) 100% backward compatibility, and try to only break forward one on major releases. I would expect these kind of requirements to be way more relaxed when it comes to a cache format? Maybe even things like forward compatibility could be completely ignored in that context?

Am not so sure how option C differs from option D, besides some 'implementation details' (which do matter, of course)? Conceptually the idea is essentially the same? As in, I do not really see how option C would save much work compared to D?


Both ID-based options should be avoided imho, the ID realm is not the place to store caches in general.

I would rule out option A completely. For all the reasons already listed above in the design task.

Regarding option B, am not sure why you would not just use the same system as existing embedded data IDs? This system would need some updates, like e.g. a per-id generic looper over embedded IDs - I've been wanting to have that anyway since quite some time.

A variant of A and B could be to have these ID caches stored in their own local BMain, owned and managed by the owning 'real' ID (nodetree e.g.). But yeah, even that I would not do.

I would also go with option D (own file format, that can be embedded). Think that the goals of a cache format are different enough from a 'full feature' geometry format, so would not consider being able to re-use the whole Mesh/PointCloud/... serializing and managing code as a strong target anyway. *(Though if some code could be factorized at a lower-level, maybe e.g. at attributes level, that would be good of course)*. And as said by Dalai, since you need an external cache format anyway, can as well start with that. Another reason to prefer this approach is that I don't think we want to have *all* caches permanently loaded in RAM? I do not see how that could work with heavy production scenes? In which case I'd expect at some point to have a 'smart loading/unloading' system. That would be way more easy to implement as its own independent thing, rather than on top of ID management code. If needs be (e.g. for better performances and/or fast random access), it could be a good opportunity to revise how we store these packed files in .blend btw? *(random ideas: have a dedicated section of the .blend file for these, instead of writing them as part of the owning ID data. Add an index to .blend file format to speedup finding these kind of data (and others) from a random access PoV.)* Adding (now or later) optimizations like cross-frame deduplications and such would also be way easier with a dedicated format rather than working with existing Geometry IDs. Oh, and one last point that is also important: it's probably way better to have caches in their own format for compatibility management. .blend file format tries very hard to keep compatibility between versions in both directions. We ensure (almost) 100% backward compatibility, and try to only break forward one on major releases. I would expect these kind of requirements to be way more relaxed when it comes to a cache format? Maybe even things like forward compatibility could be completely ignored in that context? Am not so sure how option C differs from option D, besides some 'implementation details' (which do matter, of course)? Conceptually the idea is essentially the same? As in, I do not really see how option C would save much work compared to D? -------------- Both ID-based options should be avoided imho, the ID realm is not the place to store caches in general. I would rule out option A completely. For all the reasons already listed above in the design task. Regarding option B, am not sure why you would not just use the same system as existing embedded data IDs? This system would need some updates, like e.g. a per-id generic looper over embedded IDs - I've been wanting to have that anyway since quite some time. A variant of A and B could be to have these ID caches stored in their own local BMain, owned and managed by the owning 'real' ID (nodetree e.g.). But yeah, even that I would not do.

Also guess @brecht would be interested by this topic?

Also guess @brecht would be interested by this topic?

I agree D makes sense, since we need that anyway and it needs to have the same capabilities as baking into .blend files.

That external file format could still be .blend file or at least DNA based, though I think last we discussed this it seemed better not to do this? But it wasn't a very clear conclusion.

It may be good for caching to write to an external file by default as well. And to have it integrate with other packing code, where the same operator can pack/unpack all data? This doesn't preclude keeping some of the cache in memory for faster playback, but the cache does not have to be saved in the .blend for that.

I agree D makes sense, since we need that anyway and it needs to have the same capabilities as baking into .blend files. That external file format could still be .blend file or at least DNA based, though I think last we discussed this it seemed better not to do this? But it wasn't a very clear conclusion. It may be good for caching to write to an external file by default as well. And to have it integrate with other packing code, where the same operator can pack/unpack all data? This doesn't preclude keeping some of the cache in memory for faster playback, but the cache does not have to be saved in the .blend for that.
Author
Member

Thanks for the feedback. Going with option D sounds good to me then.

random ideas: have a dedicated section of the .blend file for these, instead of writing them as part of the owning ID data. Add an index to .blend file format to speedup finding these kind of data (and others) from a random access PoV.

Sounds great. I was thinking about something similar for deduplicating attribute arrays in .blend files as an extension of #106228.

That external file format could still be .blend file or at least DNA based, though I think last we discussed this it seemed better not to do this?

Yeah right, at least in my mind that ends up being more work, more complex and more limited. Last time we decided that it's ok to start with a custom format and continue with that unless we run into major issues.

It may be good for caching to write to an external file by default as well. And to have it integrate with other packing code, where the same operator can pack/unpack all data?

Yes, I think packing everything with a single operator sounds perfectly reasonable.

Thanks for the feedback. Going with option D sounds good to me then. > random ideas: have a dedicated section of the .blend file for these, instead of writing them as part of the owning ID data. Add an index to .blend file format to speedup finding these kind of data (and others) from a random access PoV. Sounds great. I was thinking about something similar for deduplicating attribute arrays in .blend files as an extension of #106228. > That external file format could still be .blend file or at least DNA based, though I think last we discussed this it seemed better not to do this? Yeah right, at least in my mind that ends up being more work, more complex and more limited. Last time we decided that it's ok to start with a custom format and continue with that unless we run into major issues. > It may be good for caching to write to an external file by default as well. And to have it integrate with other packing code, where the same operator can pack/unpack all data? Yes, I think packing everything with a single operator sounds perfectly reasonable.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#106790
No description provided.