Fix #107500: Custom-data Python validation to prevent crashes #113525

Campbell Barton · 2023-10-11T08:01:48+02:00

Campbell Barton commented

2023-10-11 08:01:48 +02:00

Add Python instance data to CustomDataLayers so Python references
can be cleared upon re-allocation and freeing.

This is done for the UV layers and their data, with a mesh for e.g.

mesh = bpy.data.meshes[0]
layer = mesh.uv_layers[0]
layer_data = mesh.uv_layers[0].data
# Remove the mesh.
bpy.data.meshes.remove(mesh)
# Report an error instead of accessing freed memory.
print(layer.name, len(layer_data))

Support for storing instances has been added for collection properties (RNA_struct_instance).

Any layer manipulation such as adding/removing marks the Python objects as invalid.

Notes:

This PR is to check with other developers find the approach reasonable, the change is more intrusive change than I'd like but has some potential benefits too (potentially keep data valid across re-allocation).
All attributes and attribute data now uses instances, preventing crashes on their access if they're freed or re-allocated internally.
This doesn't solve the problem for scripts that hold references to individual elements (UV coordinates for e.g.).

Further work:

Support for mesh vertices/edges/loops/polygons could be added too.

Update the references instead of clearing them.

It's possible to update the properties in-place instead of marking them invalid, this would have a significant benefit that script authors would not have to access the custom-data again).
Unfortunately this would cause their __hash__ to change, causing problems if they are used in a Python set or dict keys.

There are some possible solutions to this - remove __hash__ for some property types since storing collections in a hash is not especially useful compared with data-blocks (object's, meshes ... etc).
Logically, custom-data layers are mutable collections it could be argued that removing hash is reasonable since Python equivalents (lists or dicts) can't be used as keys in a dictionary.

Another option is to use a hash that doesn't depend on the PointerRNA::data & PointerRNA::owner_id, even though it would increase hash collisions, it means the API wouldn't break and array based RNA collections could be updated in-place, arguably a net gain.

Add Python instance data to CustomDataLayers so Python references can be cleared upon re-allocation and freeing. This is done for the UV layers and their data, with a mesh for e.g. ``` mesh = bpy.data.meshes[0] layer = mesh.uv_layers[0] layer_data = mesh.uv_layers[0].data # Remove the mesh. bpy.data.meshes.remove(mesh) # Report an error instead of accessing freed memory. print(layer.name, len(layer_data)) ``` Support for storing instances has been added for collection properties (RNA_struct_instance). Any layer manipulation such as adding/removing marks the Python objects as invalid. ---- ### Notes: - This PR is to check with other developers find the approach reasonable, the change is more intrusive change than I'd like but has some potential benefits too (potentially keep data valid across re-allocation). - All attributes and attribute data now uses instances, preventing crashes on their access if they're freed or re-allocated internally. - This doesn't solve the problem for scripts that hold references to individual elements (UV coordinates for e.g.). ### Further work: Support for mesh vertices/edges/loops/polygons could be added too. **Update the references instead of clearing them.** It's possible to update the properties in-place instead of marking them invalid, this would have a significant benefit that script authors would not have to access the custom-data again). Unfortunately this would cause their `__hash__` to change, causing problems if they are used in a Python `set` or `dict` keys. There are some possible solutions to this - remove `__hash__` for some property types since storing collections in a hash is not especially useful compared with data-blocks (object's, meshes ... etc). Logically, custom-data layers are mutable collections it could be argued that removing hash is reasonable since Python equivalents (lists or dicts) can't be used as keys in a dictionary. Another option is to use a hash that doesn't depend on the `PointerRNA::data` & `PointerRNA::owner_id`, even though it would increase hash collisions, it means the API wouldn't break and array based RNA collections could be updated in-place, arguably a net gain.

Campbell Barton added 1 commit 2023-10-11 08:01:59 +02:00

e9867bc057 Fix #107500 : Custom-data Python validation to prevent crash

Add Python instance data to CustomDataLayers so Python references
can be cleared upon re-allocation and freeing.

This is done for the UV layers and their data, with a mesh for e.g.

```
mesh = bpy.context.object.data
layer = mesh.uv_layers[0]
layer_data = mesh.uv_layers[0].data
bpy.data.meshes.remove(mesh) # Remove the mesh.
print(layer.name, len(layer_data)) # Report an error instead of crashing.
```

Campbell Barton referenced this pull request

2023-10-11 08:24:27 +02:00

Python API: Mesh editing: Editing uvs after normals with a previously assigned variable doesn't work #107500

Campbell Barton changed title from ~~Fix #107500: Custom-data Python validation to prevent crash~~ to Fix #107500: Custom-data Python validation to prevent crashes

2023-10-11 08:27:49 +02:00

Campbell Barton requested review from Brecht Van Lommel 2023-10-11 08:31:15 +02:00

Campbell Barton requested review from Hans Goudey 2023-10-11 08:31:32 +02:00

Campbell Barton requested review from Martijn Versteegh 2023-10-11 08:32:05 +02:00

Brecht Van Lommel commented

2023-10-12 21:41:42 +02:00

I guess there are broadly 3 possible solutions here:

(a) Changing storage from array to e.g. array of pointers, so that at least unrelated changes do not affect the pointer while not solving accessed to removed data. For .blend file compatibility there could be a difference between runtime and .blend file storage. Smaller memory allocations would not be ideal for performance but probably ok.
(b) Data tracks which Python objects are pointing to it, as in this PR.
(c) Data has a unique identifier (integer or string), and PointerRNA stores that instead of a pointer and looks up the actual pointer every time. The RNA type could have callbacks to convert between the identifier and pointer, but this is rather slow.

I don't much like (c). For me the choice would be between (a) and (b), maybe depending we can make the implementation of this PR simpler and have the potential of using it for more data types?

For this implementation, I would have expected perhaps a single pointer to a double linked list on CustomDataLayer; rather than these 4 types of instances. If BPy_PropertyRNA contains a next and prev pointer it would not even require an additional memory allocation. Maybe that would simplify things a little bit, to not have to distinguish between instance types?

For this to be more general, even better could be if we could store that list on the datablock itself to make it more generic. But for CustomDataLayer that would require passing datablock pointers into various functions, which is probably somewhat ugly and error prone.

If this PR is made to work for individual mesh elements, I wonder how it would work? We can't really store a pointer per vertex, so it would need to be higher up anyway. So I guess you need to either store it on the datablock, have an (inefficient) RNA callback to find where to store things, or extend PointerRNA so it can contain a pointer to intermediate data like CustomData or CustomDataLayer.

I guess there are broadly 3 possible solutions here: - (a) Changing storage from array to e.g. array of pointers, so that at least unrelated changes do not affect the pointer while not solving accessed to removed data. For .blend file compatibility there could be a difference between runtime and .blend file storage. Smaller memory allocations would not be ideal for performance but probably ok. - (b) Data tracks which Python objects are pointing to it, as in this PR. - (c) Data has a unique identifier (integer or string), and `PointerRNA` stores that instead of a pointer and looks up the actual pointer every time. The RNA type could have callbacks to convert between the identifier and pointer, but this is rather slow. I don't much like (c). For me the choice would be between (a) and (b), maybe depending we can make the implementation of this PR simpler and have the potential of using it for more data types? --- For this implementation, I would have expected perhaps a single pointer to a double linked list on `CustomDataLayer`; rather than these 4 types of instances. If `BPy_PropertyRNA` contains a `next` and `prev` pointer it would not even require an additional memory allocation. Maybe that would simplify things a little bit, to not have to distinguish between instance types? For this to be more general, even better could be if we could store that list on the datablock itself to make it more generic. But for `CustomDataLayer` that would require passing datablock pointers into various functions, which is probably somewhat ugly and error prone. If this PR is made to work for individual mesh elements, I wonder how it would work? We can't really store a pointer per vertex, so it would need to be higher up anyway. So I guess you need to either store it on the datablock, have an (inefficient) RNA callback to find where to store things, or extend `PointerRNA` so it can contain a pointer to intermediate data like `CustomData` or `CustomDataLayer`.

Martijn Versteegh commented

2023-10-14 11:09:35 +02:00

I quickly looked into this, but I'll need some time to really understand what's going on and to be able to give useful feedback.

So please don't wait on my review unless you have a specific question for me.

I'll try to find the time the coming week.

@brecht wrote:

(a) Changing storage from array to e.g. array of pointers, so that at least unrelated changes do not affect the pointer while not solving accessed to removed data

This was the approach I tried in my experiments. But I ran into lots of problems. Probably mainly caused by me not understanding the RNA system well enough and also not having enough experience with the blendfile format. But at least some of the difficulties I encountered were (I think) more fundamental. For example the CustomData struct being copied verbatim in lots of places, making the tracking of the ownership of the indirect pointers hard. So maybe Campbell's approach is better.

I quickly looked into this, but I'll need some time to really understand what's going on and to be able to give useful feedback. So please don't wait on my review unless you have a specific question for me. I'll try to find the time the coming week. @brecht wrote: > (a) Changing storage from array to e.g. array of pointers, so that at least unrelated changes do not affect the pointer while not solving accessed to removed data This was the approach I tried in my experiments. But I ran into lots of problems. Probably mainly caused by me not understanding the RNA system well enough and also not having enough experience with the blendfile format. But at least some of the difficulties I encountered were (I think) more fundamental. For example the CustomData struct being copied verbatim in lots of places, making the tracking of the ownership of the indirect pointers hard. So maybe Campbell's approach is better.

Pavel Blend commented

2023-12-16 19:03:01 +01:00

First-time contributor

Will this fix be available for blender 3.6 LTS? I'm using 3.6 because it's the latest blender that runs on my computer. And this error critically affects my addon.

Martijn Versteegh commented

2023-12-16 21:31:15 +01:00

To be honest it's a bit questionable if this fix will be included at all I think.

The better fix would probably be to rewrite attribute storage.

You can work around by always reinitialising any python variables pointing to uv layers (or other attribute layers) after any operation that adds or removes layers.

To be honest it's a bit questionable if this fix will be included *at all* I think. The better fix would probably be to rewrite attribute storage. You can work around by always reinitialising any python variables pointing to uv layers (or other attribute layers) after any operation that adds or removes layers.

👀 1

Pavel Blend commented

2023-12-16 22:23:47 +01:00

First-time contributor

@Baardaap Can you write example code that works around this error? I can not. I wrote it like this:

bpy_mesh.vertex_colors.new(name='Test')
test_vert_color = bpy_mesh.vertex_colors['Test']

instead of this code:

test_vert_color = bpy_mesh.vertex_colors.new(name='Test')

But there is still an error.

@Baardaap Can you write example code that works around this error? I can not. I wrote it like this: ```python bpy_mesh.vertex_colors.new(name='Test') test_vert_color = bpy_mesh.vertex_colors['Test'] ``` instead of this code: ```python test_vert_color = bpy_mesh.vertex_colors.new(name='Test') ``` But there is still an error.

Martijn Versteegh commented

2023-12-18 10:25:45 +01:00

It does not matter (for this specific bug) if you use an existing layer or create a new one.

All that matters is that you should do the

test_vert_color = bpy_mesh.vertex_colors['Test']

right before using test_colors, without any code creating or deleting layers inbetween.

But this gets a bit offtopic for this bugreport. You could contact me on blender.chat with the specific error message to see if I can help.

It does not matter (for this specific bug) if you use an existing layer or create a new one. All that matters is that you should do the ``` test_vert_color = bpy_mesh.vertex_colors['Test'] ``` right before using test_colors, without any code creating or deleting layers inbetween. But this gets a bit offtopic for this bugreport. You could contact me on blender.chat with the specific error message to see if I can help.

Pavel Blend commented

2023-12-18 10:36:47 +01:00

First-time contributor

@Baardaap I managed to fix the error:
c66fb81d05

But I can't figure out why uv_layers only throw an error in one place in my addon. I'm using uv_layers everywhere, but the error only occurs on one line.

@Baardaap I managed to fix the error: https://github.com/PavelBlend/blender-xray/commit/c66fb81d052e5be47375bbe2d0f66519530fc3a7 But I can't figure out why uv_layers only throw an error in one place in my addon. I'm using uv_layers everywhere, but the error only occurs on one line.

Martijn Versteegh commented

2023-12-18 15:54:08 +01:00

Maybe that's the only place a CustomData layer has been added or deleted before the bpy_mesh.uv_layers['Texture'] layer?

Maybe that's the only place a CustomData layer has been added or deleted *before* the bpy_mesh.uv_layers['Texture'] layer?

👀 1

Hans Goudey commented

2024-01-08 22:16:35 +01:00

This adds more complexity than I'm comfortable with here TBH. A much simpler solution would be changing to individually allocating CustomDataLayer as Brecht mentioned above as option (a).

Personally I'd try to tie that into a refactor to a different attribute storage replacing CustomData (still using it for BMesh), if we're changing the file format anyway (though we'd still have to write with CustomData until 5.0).

This adds more complexity than I'm comfortable with here TBH. A much simpler solution would be changing to individually allocating `CustomDataLayer` as Brecht mentioned above as option (a). Personally I'd try to tie that into a refactor to a different attribute storage replacing `CustomData` (still using it for BMesh), if we're changing the file format anyway (though we'd still have to write with `CustomData` until 5.0).

This pull request has changes conflicting with the target branch.

source/blender/blenkernel/intern/customdata.cc
source/blender/makesrna/intern/rna_mesh.cc

View command line instructions.

Checkout

From your project repository, check out a new branch and test the changes.

git fetch -u pr-custom-data-py-inst:ideasman42-pr-custom-data-py-inst

git checkout ideasman42-pr-custom-data-py-inst

Sign in to join this conversation.

No reviewers

No Label

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Fix #107500: Custom-data Python validation to prevent crashes #113525

Notes:

Further work:

Checkout