Adding "minimal Python Interpreter" for Drivers #47823

Closed
opened 2016-03-16 14:30:43 +01:00 by Gaia Clary · 20 comments
Member

This is a summary of an IRC chat of today (with Kaito and Aligorith)

When drivers are used, Python Scripting must be enabled. However it looks like most drivers can be created using a very limited set of Python (basic math operations come to mind). Because of this it may be possible and desirable to add a "minimal python interpreter" only for usage with Drivers.

@JoshuaLeung: Would you mind to add some comments about how such a "minimal python interpreter" could possiblky be implemented?

Some remarks collected from the IRC chat (just so that this information doesn't get lost):

from Aligorith:

  • Basically it (the mini interpreter) would strictly white-list the types of stuff you can do with it - anything that gets too close to being able to be used for naughty stuff wouldn't be allowed.
  • really, most of the drivers I've seen seem to use some combination of sin/cos and/or simple addition/mult/etc.
  • ... we could play around with doing stuff like using the python ast's, and pruning out stuff we don't like when in "secure" mode

from Kaito:

  • Absolute secure doesn't exist, so we better just try to minimize risk or damage.
  • The idea of a sandboxed 2nd py interpretor who only does blender py rna and basic math is great.
  • You know, with such a mini-py we can also drop the gpl requirement and make .blends with embedded driver scripts etc 'free' again
  • I don't think we need to code own interpreter
  • I would check with the python.org team what the smallest compatible interpreter would look like and i would check with houdini, maya, lightwave and others who move to py what they do
  • The traditional py devs work on servers, they see security quite different...

from me:

  • What about adding a couple of python classes for the basic operations? those classes can be marked as "can be used in drivers without script enabled" or so ?
This is a summary of an IRC chat of today (with Kaito and Aligorith) When drivers are used, Python Scripting must be enabled. However it looks like most drivers can be created using a very limited set of Python (basic math operations come to mind). Because of this it may be possible and desirable to add a "minimal python interpreter" only for usage with Drivers. @JoshuaLeung: Would you mind to add some comments about how such a "minimal python interpreter" could possiblky be implemented? Some remarks collected from the IRC chat (just so that this information doesn't get lost): from Aligorith: * Basically it (the mini interpreter) would strictly white-list the types of stuff you can do with it - anything that gets too close to being able to be used for naughty stuff wouldn't be allowed. * really, most of the drivers I've seen seem to use some combination of sin/cos and/or simple addition/mult/etc. * ... we could play around with doing stuff like using the python ast's, and pruning out stuff we don't like when in "secure" mode from Kaito: * Absolute secure doesn't exist, so we better just try to minimize risk or damage. * The idea of a sandboxed 2nd py interpretor who only does blender py rna and basic math is great. * You know, with such a mini-py we can also drop the gpl requirement and make .blends with embedded driver scripts etc 'free' again * I don't think we need to code own interpreter * I would check with the python.org team what the smallest compatible interpreter would look like and i would check with houdini, maya, lightwave and others who move to py what they do * The traditional py devs work on servers, they see security quite different... from me: * What about adding a couple of python classes for the basic operations? those classes can be marked as "can be used in drivers without script enabled" or so ?
Author
Member

Changed status to: 'Open'

Changed status to: 'Open'
Joshua Leung was assigned by Gaia Clary 2016-03-16 14:30:43 +01:00
Author
Member

Added subscribers: @GaiaClary, @JoshuaLeung

Added subscribers: @GaiaClary, @JoshuaLeung

Added subscriber: @ideasman42

Added subscriber: @ideasman42

One concern I have with this, is it gives some kind of challenge for people to circumvent it.

A while ago I looked into the BGE's sandboxing options- and each time I managed to lock it down, some clever Python guys would show an example of how it could be worked around (quite trivially).
I realize with some more advanced tricks (checking bytecode or ast), some restricted set of Python could be enforced, however - if this ends up being easy to workaround (say 10min or searching online).
Then I'm not sure its worth the effort to attempt to sandbox in the first place. Since we would then be promoting a feature as Secure which would in fact be quite insecure.


Other points...

  • Don't think this makes any changes to how the GPL works with drivers, a driver can already use a restricted set of functions - and many do.
  • The kinds of drivers you would evaluate with a restricted API are likely small math expressions - not something you would hold copyright on (maybe too big a topic for this task, but think this is the case for very small expressions).

@gaia, would need a more concrete example - how would the classes work?

One concern I have with this, is it gives some kind of challenge for people to circumvent it. A while ago I looked into the BGE's sandboxing options- and each time I managed to lock it down, some clever Python guys would show an example of how it could be worked around (quite trivially). I realize with some more advanced tricks (checking bytecode or ast), some restricted set of Python could be enforced, however - if this ends up being *easy* to workaround (say 10min or searching online). Then I'm not sure its worth the effort to attempt to sandbox in the first place. Since we would then be promoting a feature as *Secure* which would in fact be quite insecure. ---- Other points... - Don't think this makes any changes to how the GPL works with drivers, a driver can already use a restricted set of functions - and many do. - The kinds of drivers you would evaluate with a restricted API are likely small math expressions - not something you would hold copyright on *(maybe too big a topic for this task, but think this is the case for very small expressions)*. @gaia, would need a more concrete example - how would the classes work?
Member

My preferred option for eliminating any lingering GPL and Python sandbox bypassing would be to simply write our own simple parser and/or interpreter.

Pros:

  • There isn't the risk of anyone escaping the sandbox as our parser would simply barf on any inputs that try to do anything tricky they it can't handle. Anything that we can't handle is passed back to the standard python interpreter (which only runs when allowed). UI wise this distinction should be indicated (maybe via the presence of a py icon or the old radiosity icon).

  • From GPL perspective, I'm guessing that if we make this "GPL compatible but not GPL" licensed it would rule out any of the standard concerns there.

  • As we are only handling a very limited subset of python, there may be some perf benefits in some cases? It off course depends how we do it, but just by shortcircuiting some of the typechecks and callback lookups we should get some minimal differences in theory.

Cons:

  • We need to write a simple parser + interpreter. That however is not such a big issue and can be done quite easily... it just needs a little time...

  • Potential for other security slips from having our own parser.

My preferred option for eliminating any lingering GPL and Python sandbox bypassing would be to simply write our own simple parser and/or interpreter. Pros: * There isn't the risk of anyone escaping the sandbox as our parser would simply barf on any inputs that try to do anything tricky they it can't handle. Anything that we can't handle is passed back to the standard python interpreter (which only runs when allowed). UI wise this distinction should be indicated (maybe via the presence of a py icon or the old radiosity icon). * From GPL perspective, I'm guessing that if we make this "GPL compatible but not GPL" licensed it would rule out any of the standard concerns there. * As we are only handling a very limited subset of python, there may be some perf benefits in some cases? It off course depends how we do it, but just by shortcircuiting some of the typechecks and callback lookups we should get some minimal differences in theory. Cons: * We need to write a simple parser + interpreter. That however is not such a big issue and can be done quite easily... it just needs a little time... * Potential for other security slips from having our own parser.
Member

Also just to reiterate, we can only use this for handling "simple" driver expressions - ie the sort that just perform math using the builtin math funcs, +-×÷, and the driver vars that were defined for that driver.

Also just to reiterate, we can only use this for handling "simple" driver expressions - ie the sort that just perform math using the builtin math funcs, +-×÷, and the driver vars that were defined for that driver.

Added subscriber: @sindra1961

Added subscriber: @sindra1961

Think if this is to be solved - having a simple parser that handled basic math expressions is better than attempting to sand-box CPython.


As for GPL issues - think we should get advice here, and not take on a lot of work because of GPL issues that might exist.
We should understand exactly what the implications currently are.

We could make an official statement (and get FSF to double check it), eg:

Driver expressions that use only Python API's and don't call into Blender's API's, aren't subject to the GPL.

... this covers typical math expressions (most drivers).

Think if this is to be solved - having a simple parser that handled basic math expressions is better than attempting to sand-box CPython. ---- As for GPL issues - think we should get advice here, and not take on a lot of work because of GPL issues that *might* exist. We should understand exactly what the implications currently are. We could make an official statement *(and get FSF to double check it)*, eg: > Driver expressions that use only Python API's and don't call into Blender's API's, aren't subject to the GPL. ... this covers typical math expressions (most drivers).

Correction, the patch linked has an error and isn't working, see reply below (fixed and linked to differential).


Similar to @JoshuaLeung's suggestion to manipulate the AST, there have been a few projects that allow byte-code level manipulation.
One thats quite popular and well maintained is numba, which converts Python bytecode to LLVM instructions, and interestingly has the ability to disable calling back into CPython from the converted functions [1].

This is an experimental patch P338, which uses numba from Blender's PyDrivers when auto-execution is disabled, import and open raise an exception, while math functions (sin/cos/tan... etc) work as expected.
However since this isn't written with security as the main purpose, its possible there is some way to break out of the sandbox (I'll mail their list and see if this is considered secure).

Tested this with a production file from glass-half (01_render.blend), and the rigs work without any problems and the same performance.
(improved performance may be possible, most likely the performance cost is setting up the Python context and not the execution it's self).

*Correction, the patch linked has an error and isn't working, see reply below (fixed and linked to differential).* ---- Similar to @JoshuaLeung's suggestion to manipulate the AST, there have been a few projects that allow byte-code level manipulation. One thats quite popular and well maintained is `numba`, which converts Python bytecode to LLVM instructions, and interestingly has the ability to disable calling back into CPython from the converted functions [1]. This is an experimental patch [P338](https://archive.blender.org/developer/P338.txt), which uses `numba` from Blender's PyDrivers when auto-execution is disabled, `import` and `open` raise an exception, while math functions (sin/cos/tan... etc) work as expected. However since this isn't written with security as the main purpose, its possible there is some way to break out of the sandbox (I'll mail their list and see if this is considered *secure*). Tested this with a production file from glass-half (`01_render.blend`), and the rigs work without any problems and the same performance. *(improved performance may be possible, most likely the performance cost is setting up the Python context and not the execution it's self)*. - [x]: http://numba.pydata.org - [x]: http://numba.pydata.org/numba-doc/latest/user/jit.html?#nopython
Member

@ideasman42: Interesting find!

A few questions we'd need to check on:

  1. How do we set up numba to test this?
  2. What sort of impact would numba have on distribution sizes? From the downloads page, the packages seem to be just under 1mb. (I haven't checked yet whether that includes or doesn't include any LLVM stuff, though I imagine that LLVM tends to be quite a bit larger. Anyway, if LLVM is not included, then we already have it included for some of the cycles stuff, so it wouldn't be too much of a stretch I guess)
  3. You mentioned import and file IO. What about some of the other nasties such as os (and other ways of executing commands)?
  4. What happens with custom functions added to the driver functions namespace - stuff that riggers can define in textblocks and register? Is numba restricted to running with what it can see in the expression (and a few other builtins it has converted), or does that extend to everything in the namespace it encounters?
@ideasman42: Interesting find! A few questions we'd need to check on: 1) How do we set up numba to test this? 2) What sort of impact would numba have on distribution sizes? From the downloads page, the packages seem to be just under 1mb. (I haven't checked yet whether that includes or doesn't include any LLVM stuff, though I imagine that LLVM tends to be quite a bit larger. Anyway, if LLVM is not included, then we already have it included for some of the cycles stuff, so it wouldn't be too much of a stretch I guess) 3) You mentioned import and file IO. What about some of the other nasties such as os (and other ways of executing commands)? 4) What happens with custom functions added to the driver functions namespace - stuff that riggers can define in textblocks and register? Is numba restricted to running with what it can see in the expression (and a few other builtins it has converted), or does that extend to everything in the namespace it encounters?

In #47823#364435, @JoshuaLeung wrote:
@ideasman42: Interesting find!

A few questions we'd need to check on:

  1. How do we set up numba to test this?

http://numba.pydata.org/#installing

Though I built it from source - https://github.com/numba/numba#installing-numba

  1. What sort of impact would numba have on distribution sizes? From the downloads page, the packages seem to be just under 1mb. (I haven't checked yet whether that includes or doesn't include any LLVM stuff, though I imagine that LLVM tends to be quite a bit larger. Anyway, if LLVM is not included, then we already have it included for some of the cycles stuff, so it wouldn't be too much of a stretch I guess)

Both the dependencies (LLVM and Numpy) are already included with Blender.
So we should be able to use it without adding extra deps apart from numba it's self.

  1. You mentioned import and file IO. What about some of the other nasties such as os (and other ways of executing commands)?

You cant access os because you can't import, and even if you add the functions into the namespace, they won't execute (from my own tests in the Python3.5 command prompt), I've mailed their list to ask if the this could be used as a sandbox, since it isn't mentioned in their docs.

  1. What happens with custom functions added to the driver functions namespace - stuff that riggers can define in textblocks and register? Is numba restricted to running with what it can see in the expression (and a few other builtins it has converted), or does that extend to everything in the namespace it encounters?

Anything that calls back to the CPython API raises an exception, that includes any functions you pass in the name-space.
They must be handling calls from the math module as a special case since the math functions in existing rigs are working as expected.

> In #47823#364435, @JoshuaLeung wrote: > @ideasman42: Interesting find! > > A few questions we'd need to check on: > 1) How do we set up numba to test this? http://numba.pydata.org/#installing Though I built it from source - https://github.com/numba/numba#installing-numba > 2) What sort of impact would numba have on distribution sizes? From the downloads page, the packages seem to be just under 1mb. (I haven't checked yet whether that includes or doesn't include any LLVM stuff, though I imagine that LLVM tends to be quite a bit larger. Anyway, if LLVM is not included, then we already have it included for some of the cycles stuff, so it wouldn't be too much of a stretch I guess) Both the dependencies (LLVM and Numpy) are already included with Blender. So we should be able to use it without adding extra deps apart from numba it's self. > 3) You mentioned import and file IO. What about some of the other nasties such as os (and other ways of executing commands)? You cant access `os` because you can't import, and even if you add the functions into the namespace, they won't execute (from my own tests in the Python3.5 command prompt), I've mailed their list to ask if the this could be used as a sandbox, since it isn't mentioned in their docs. > 4) What happens with custom functions added to the driver functions namespace - stuff that riggers can define in textblocks and register? Is numba restricted to running with what it can see in the expression (and a few other builtins it has converted), or does that extend to everything in the namespace it encounters? Anything that calls back to the CPython API raises an exception, that includes any functions you pass in the name-space. They must be handling calls from the `math` module as a special case since the math functions in existing rigs are working as expected.

It seems am talking rubbish and this is not working at all! my testes in the Py console overlooked that the function needs to run at least once before we can get the newly created "code" object back out of the function. (so the basic principle can work, but needs some tweaks).

However it looks like this isn't so hard to support, though we will need function calls instead of evaluating with a name-space since numba doesn't support reading variables, only arguments to a function.

It seems am talking rubbish and this is not working at all! my testes in the Py console overlooked that the function needs to run at least once before we can get the newly created "code" object back out of the function. (so the basic principle can work, but needs some tweaks). However it looks like this isn't so hard to support, though we will need function calls instead of evaluating with a name-space since numba doesn't support reading variables, only arguments to a function.

Added subscriber: @pink.vertex

Added subscriber: @pink.vertex

You might use nodes for drivers?
They would visually represent the AST of an arithmetic expression.
From the node inputs the dependencies for the dependency graph could be derived.

In a text parser you would have to resolve the variables from the driver which you have to setup beforehand?

Further you might want to support vector inputs and vector operations?

You might use nodes for drivers? They would visually represent the AST of an arithmetic expression. From the node inputs the dependencies for the dependency graph could be derived. In a text parser you would have to resolve the variables from the driver which you have to setup beforehand? Further you might want to support vector inputs and vector operations?

Update, got numba working correctly, and tested with glass-half file, D1860

In summary - it works but initial driver compilation is very slow.

Update, got numba working correctly, and tested with glass-half file, [D1860](https://archive.blender.org/developer/D1860) In summary - it works but initial driver compilation is very slow.
Member

Added subscriber: @zanqdo

Added subscriber: @zanqdo
Member

I think the most sensible comment has been "let's check how others do it". Oh boy it almost doesn't seem like Blender! So how do others do it?

I think the most sensible comment has been "let's check how others do it". Oh boy it almost doesn't seem like Blender! So how do others do it?

Looked into yet another method of locking down Python, D1862

This method checks the byte-code, restricting what can be done.

Looked into yet another method of locking down Python, [D1862](https://archive.blender.org/developer/D1862) This method checks the byte-code, restricting what can be done.

Changed status from 'Confirmed' to: 'Resolved'

Changed status from 'Confirmed' to: 'Resolved'

This has been done, see: bf2a54b058

This has been done, see: bf2a54b058
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
6 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#47823
No description provided.