Adding "minimal Python Interpreter" for Drivers #47823

New Issue

Gaia Clary · 2016-03-16T14:30:43+01:00

Gaia Clary commented

2016-03-16 14:30:43 +01:00

This is a summary of an IRC chat of today (with Kaito and Aligorith)

When drivers are used, Python Scripting must be enabled. However it looks like most drivers can be created using a very limited set of Python (basic math operations come to mind). Because of this it may be possible and desirable to add a "minimal python interpreter" only for usage with Drivers.

@JoshuaLeung: Would you mind to add some comments about how such a "minimal python interpreter" could possiblky be implemented?

Some remarks collected from the IRC chat (just so that this information doesn't get lost):

from Aligorith:

Basically it (the mini interpreter) would strictly white-list the types of stuff you can do with it - anything that gets too close to being able to be used for naughty stuff wouldn't be allowed.
really, most of the drivers I've seen seem to use some combination of sin/cos and/or simple addition/mult/etc.
... we could play around with doing stuff like using the python ast's, and pruning out stuff we don't like when in "secure" mode

from Kaito:

Absolute secure doesn't exist, so we better just try to minimize risk or damage.
The idea of a sandboxed 2nd py interpretor who only does blender py rna and basic math is great.
You know, with such a mini-py we can also drop the gpl requirement and make .blends with embedded driver scripts etc 'free' again
I don't think we need to code own interpreter
I would check with the python.org team what the smallest compatible interpreter would look like and i would check with houdini, maya, lightwave and others who move to py what they do
The traditional py devs work on servers, they see security quite different...

from me:

What about adding a couple of python classes for the basic operations? those classes can be marked as "can be used in drivers without script enabled" or so ?

This is a summary of an IRC chat of today (with Kaito and Aligorith) When drivers are used, Python Scripting must be enabled. However it looks like most drivers can be created using a very limited set of Python (basic math operations come to mind). Because of this it may be possible and desirable to add a "minimal python interpreter" only for usage with Drivers. @JoshuaLeung: Would you mind to add some comments about how such a "minimal python interpreter" could possiblky be implemented? Some remarks collected from the IRC chat (just so that this information doesn't get lost): from Aligorith: * Basically it (the mini interpreter) would strictly white-list the types of stuff you can do with it - anything that gets too close to being able to be used for naughty stuff wouldn't be allowed. * really, most of the drivers I've seen seem to use some combination of sin/cos and/or simple addition/mult/etc. * ... we could play around with doing stuff like using the python ast's, and pruning out stuff we don't like when in "secure" mode from Kaito: * Absolute secure doesn't exist, so we better just try to minimize risk or damage. * The idea of a sandboxed 2nd py interpretor who only does blender py rna and basic math is great. * You know, with such a mini-py we can also drop the gpl requirement and make .blends with embedded driver scripts etc 'free' again * I don't think we need to code own interpreter * I would check with the python.org team what the smallest compatible interpreter would look like and i would check with houdini, maya, lightwave and others who move to py what they do * The traditional py devs work on servers, they see security quite different... from me: * What about adding a couple of python classes for the basic operations? those classes can be marked as "can be used in drivers without script enabled" or so ?

Gaia Clary commented

2016-03-16 14:30:43 +01:00

Changed status to: 'Open'

Joshua Leung was assigned by Gaia Clary

2016-03-16 14:30:43 +01:00

Gaia Clary commented

2016-03-16 14:30:43 +01:00

Added subscribers: @GaiaClary, @JoshuaLeung

Campbell Barton commented

2016-03-16 15:06:14 +01:00

Added subscriber: @ideasman42

Campbell Barton commented

2016-03-16 15:06:14 +01:00

One concern I have with this, is it gives some kind of challenge for people to circumvent it.

A while ago I looked into the BGE's sandboxing options- and each time I managed to lock it down, some clever Python guys would show an example of how it could be worked around (quite trivially).
I realize with some more advanced tricks (checking bytecode or ast), some restricted set of Python could be enforced, however - if this ends up being easy to workaround (say 10min or searching online).
Then I'm not sure its worth the effort to attempt to sandbox in the first place. Since we would then be promoting a feature as Secure which would in fact be quite insecure.

Other points...

Don't think this makes any changes to how the GPL works with drivers, a driver can already use a restricted set of functions - and many do.
The kinds of drivers you would evaluate with a restricted API are likely small math expressions - not something you would hold copyright on (maybe too big a topic for this task, but think this is the case for very small expressions).

@gaia, would need a more concrete example - how would the classes work?

One concern I have with this, is it gives some kind of challenge for people to circumvent it. A while ago I looked into the BGE's sandboxing options- and each time I managed to lock it down, some clever Python guys would show an example of how it could be worked around (quite trivially). I realize with some more advanced tricks (checking bytecode or ast), some restricted set of Python could be enforced, however - if this ends up being *easy* to workaround (say 10min or searching online). Then I'm not sure its worth the effort to attempt to sandbox in the first place. Since we would then be promoting a feature as *Secure* which would in fact be quite insecure. ---- Other points... - Don't think this makes any changes to how the GPL works with drivers, a driver can already use a restricted set of functions - and many do. - The kinds of drivers you would evaluate with a restricted API are likely small math expressions - not something you would hold copyright on *(maybe too big a topic for this task, but think this is the case for very small expressions)*. @gaia, would need a more concrete example - how would the classes work?

Joshua Leung commented

2016-03-17 02:12:43 +01:00

My preferred option for eliminating any lingering GPL and Python sandbox bypassing would be to simply write our own simple parser and/or interpreter.

Pros:

There isn't the risk of anyone escaping the sandbox as our parser would simply barf on any inputs that try to do anything tricky they it can't handle. Anything that we can't handle is passed back to the standard python interpreter (which only runs when allowed). UI wise this distinction should be indicated (maybe via the presence of a py icon or the old radiosity icon).
From GPL perspective, I'm guessing that if we make this "GPL compatible but not GPL" licensed it would rule out any of the standard concerns there.
As we are only handling a very limited subset of python, there may be some perf benefits in some cases? It off course depends how we do it, but just by shortcircuiting some of the typechecks and callback lookups we should get some minimal differences in theory.

Cons:

We need to write a simple parser + interpreter. That however is not such a big issue and can be done quite easily... it just needs a little time...
Potential for other security slips from having our own parser.

My preferred option for eliminating any lingering GPL and Python sandbox bypassing would be to simply write our own simple parser and/or interpreter. Pros: * There isn't the risk of anyone escaping the sandbox as our parser would simply barf on any inputs that try to do anything tricky they it can't handle. Anything that we can't handle is passed back to the standard python interpreter (which only runs when allowed). UI wise this distinction should be indicated (maybe via the presence of a py icon or the old radiosity icon). * From GPL perspective, I'm guessing that if we make this "GPL compatible but not GPL" licensed it would rule out any of the standard concerns there. * As we are only handling a very limited subset of python, there may be some perf benefits in some cases? It off course depends how we do it, but just by shortcircuiting some of the typechecks and callback lookups we should get some minimal differences in theory. Cons: * We need to write a simple parser + interpreter. That however is not such a big issue and can be done quite easily... it just needs a little time... * Potential for other security slips from having our own parser.

Joshua Leung commented

2016-03-17 02:24:56 +01:00

Also just to reiterate, we can only use this for handling "simple" driver expressions - ie the sort that just perform math using the builtin math funcs, +-×÷, and the driver vars that were defined for that driver.

perfection cat commented

2016-03-17 02:54:58 +01:00

Added subscriber: @sindra1961

Campbell Barton commented

2016-03-17 05:21:43 +01:00

Think if this is to be solved - having a simple parser that handled basic math expressions is better than attempting to sand-box CPython.

As for GPL issues - think we should get advice here, and not take on a lot of work because of GPL issues that might exist.
We should understand exactly what the implications currently are.

We could make an official statement (and get FSF to double check it), eg:

Driver expressions that use only Python API's and don't call into Blender's API's, aren't subject to the GPL.

... this covers typical math expressions (most drivers).

Think if this is to be solved - having a simple parser that handled basic math expressions is better than attempting to sand-box CPython. ---- As for GPL issues - think we should get advice here, and not take on a lot of work because of GPL issues that *might* exist. We should understand exactly what the implications currently are. We could make an official statement *(and get FSF to double check it)*, eg: > Driver expressions that use only Python API's and don't call into Blender's API's, aren't subject to the GPL. ... this covers typical math expressions (most drivers).

Campbell Barton commented

2016-03-17 06:30:24 +01:00

Correction, the patch linked has an error and isn't working, see reply below (fixed and linked to differential).

Similar to @JoshuaLeung's suggestion to manipulate the AST, there have been a few projects that allow byte-code level manipulation.
One thats quite popular and well maintained is numba, which converts Python bytecode to LLVM instructions, and interestingly has the ability to disable calling back into CPython from the converted functions [1].

This is an experimental patch P338, which uses numba from Blender's PyDrivers when auto-execution is disabled, import and open raise an exception, while math functions (sin/cos/tan... etc) work as expected.
However since this isn't written with security as the main purpose, its possible there is some way to break out of the sandbox (I'll mail their list and see if this is considered secure).

Tested this with a production file from glass-half (01_render.blend), and the rigs work without any problems and the same performance.
(improved performance may be possible, most likely the performance cost is setting up the Python context and not the execution it's self).

*Correction, the patch linked has an error and isn't working, see reply below (fixed and linked to differential).* ---- Similar to @JoshuaLeung's suggestion to manipulate the AST, there have been a few projects that allow byte-code level manipulation. One thats quite popular and well maintained is `numba`, which converts Python bytecode to LLVM instructions, and interestingly has the ability to disable calling back into CPython from the converted functions [1]. This is an experimental patch [P338](https://archive.blender.org/developer/P338.txt), which uses `numba` from Blender's PyDrivers when auto-execution is disabled, `import` and `open` raise an exception, while math functions (sin/cos/tan... etc) work as expected. However since this isn't written with security as the main purpose, its possible there is some way to break out of the sandbox (I'll mail their list and see if this is considered *secure*). Tested this with a production file from glass-half (`01_render.blend`), and the rigs work without any problems and the same performance. *(improved performance may be possible, most likely the performance cost is setting up the Python context and not the execution it's self)*. - [x]: http://numba.pydata.org - [x]: http://numba.pydata.org/numba-doc/latest/user/jit.html?#nopython

Joshua Leung commented

2016-03-17 07:23:25 +01:00

@ideasman42: Interesting find!

A few questions we'd need to check on:

How do we set up numba to test this?
What sort of impact would numba have on distribution sizes? From the downloads page, the packages seem to be just under 1mb. (I haven't checked yet whether that includes or doesn't include any LLVM stuff, though I imagine that LLVM tends to be quite a bit larger. Anyway, if LLVM is not included, then we already have it included for some of the cycles stuff, so it wouldn't be too much of a stretch I guess)
You mentioned import and file IO. What about some of the other nasties such as os (and other ways of executing commands)?
What happens with custom functions added to the driver functions namespace - stuff that riggers can define in textblocks and register? Is numba restricted to running with what it can see in the expression (and a few other builtins it has converted), or does that extend to everything in the namespace it encounters?

@ideasman42: Interesting find! A few questions we'd need to check on: 1) How do we set up numba to test this? 2) What sort of impact would numba have on distribution sizes? From the downloads page, the packages seem to be just under 1mb. (I haven't checked yet whether that includes or doesn't include any LLVM stuff, though I imagine that LLVM tends to be quite a bit larger. Anyway, if LLVM is not included, then we already have it included for some of the cycles stuff, so it wouldn't be too much of a stretch I guess) 3) You mentioned import and file IO. What about some of the other nasties such as os (and other ways of executing commands)? 4) What happens with custom functions added to the driver functions namespace - stuff that riggers can define in textblocks and register? Is numba restricted to running with what it can see in the expression (and a few other builtins it has converted), or does that extend to everything in the namespace it encounters?

Campbell Barton commented

2016-03-17 07:56:46 +01:00

In #47823#364435, @JoshuaLeung wrote:
@ideasman42: Interesting find!

A few questions we'd need to check on:

How do we set up numba to test this?

http://numba.pydata.org/#installing

Though I built it from source - https://github.com/numba/numba#installing-numba

What sort of impact would numba have on distribution sizes? From the downloads page, the packages seem to be just under 1mb. (I haven't checked yet whether that includes or doesn't include any LLVM stuff, though I imagine that LLVM tends to be quite a bit larger. Anyway, if LLVM is not included, then we already have it included for some of the cycles stuff, so it wouldn't be too much of a stretch I guess)

Both the dependencies (LLVM and Numpy) are already included with Blender.
So we should be able to use it without adding extra deps apart from numba it's self.

You mentioned import and file IO. What about some of the other nasties such as os (and other ways of executing commands)?

You cant access os because you can't import, and even if you add the functions into the namespace, they won't execute (from my own tests in the Python3.5 command prompt), I've mailed their list to ask if the this could be used as a sandbox, since it isn't mentioned in their docs.

What happens with custom functions added to the driver functions namespace - stuff that riggers can define in textblocks and register? Is numba restricted to running with what it can see in the expression (and a few other builtins it has converted), or does that extend to everything in the namespace it encounters?

Anything that calls back to the CPython API raises an exception, that includes any functions you pass in the name-space.
They must be handling calls from the math module as a special case since the math functions in existing rigs are working as expected.

> In #47823#364435, @JoshuaLeung wrote: > @ideasman42: Interesting find! > > A few questions we'd need to check on: > 1) How do we set up numba to test this? http://numba.pydata.org/#installing Though I built it from source - https://github.com/numba/numba#installing-numba > 2) What sort of impact would numba have on distribution sizes? From the downloads page, the packages seem to be just under 1mb. (I haven't checked yet whether that includes or doesn't include any LLVM stuff, though I imagine that LLVM tends to be quite a bit larger. Anyway, if LLVM is not included, then we already have it included for some of the cycles stuff, so it wouldn't be too much of a stretch I guess) Both the dependencies (LLVM and Numpy) are already included with Blender. So we should be able to use it without adding extra deps apart from numba it's self. > 3) You mentioned import and file IO. What about some of the other nasties such as os (and other ways of executing commands)? You cant access `os` because you can't import, and even if you add the functions into the namespace, they won't execute (from my own tests in the Python3.5 command prompt), I've mailed their list to ask if the this could be used as a sandbox, since it isn't mentioned in their docs. > 4) What happens with custom functions added to the driver functions namespace - stuff that riggers can define in textblocks and register? Is numba restricted to running with what it can see in the expression (and a few other builtins it has converted), or does that extend to everything in the namespace it encounters? Anything that calls back to the CPython API raises an exception, that includes any functions you pass in the name-space. They must be handling calls from the `math` module as a special case since the math functions in existing rigs are working as expected.

Campbell Barton commented

2016-03-17 17:03:04 +01:00

It seems am talking rubbish and this is not working at all! my testes in the Py console overlooked that the function needs to run at least once before we can get the newly created "code" object back out of the function. (so the basic principle can work, but needs some tweaks).

However it looks like this isn't so hard to support, though we will need function calls instead of evaluating with a name-space since numba doesn't support reading variables, only arguments to a function.

It seems am talking rubbish and this is not working at all! my testes in the Py console overlooked that the function needs to run at least once before we can get the newly created "code" object back out of the function. (so the basic principle can work, but needs some tweaks). However it looks like this isn't so hard to support, though we will need function calls instead of evaluating with a name-space since numba doesn't support reading variables, only arguments to a function.

Gyro Gearloose commented

2016-03-17 21:54:49 +01:00

Added subscriber: @pink.vertex

Gyro Gearloose commented

2016-03-17 21:54:49 +01:00

You might use nodes for drivers?
They would visually represent the AST of an arithmetic expression.
From the node inputs the dependencies for the dependency graph could be derived.

In a text parser you would have to resolve the variables from the driver which you have to setup beforehand?

Further you might want to support vector inputs and vector operations?

You might use nodes for drivers? They would visually represent the AST of an arithmetic expression. From the node inputs the dependencies for the dependency graph could be derived. In a text parser you would have to resolve the variables from the driver which you have to setup beforehand? Further you might want to support vector inputs and vector operations?

Campbell Barton commented

2016-03-18 07:24:10 +01:00

Update, got numba working correctly, and tested with glass-half file, D1860

In summary - it works but initial driver compilation is very slow.

Update, got numba working correctly, and tested with glass-half file, [D1860](https://archive.blender.org/developer/D1860) In summary - it works but initial driver compilation is very slow.

Daniel Salazar commented

2016-03-19 05:41:53 +01:00

Added subscriber: @zanqdo

Daniel Salazar commented

2016-03-19 05:41:53 +01:00

I think the most sensible comment has been "let's check how others do it". Oh boy it almost doesn't seem like Blender! So how do others do it?

Campbell Barton commented

2016-03-19 12:18:09 +01:00

Looked into yet another method of locking down Python, D1862

This method checks the byte-code, restricting what can be done.

Looked into yet another method of locking down Python, [D1862](https://archive.blender.org/developer/D1862) This method checks the byte-code, restricting what can be done.

Campbell Barton commented

2020-06-19 10:54:15 +02:00

Changed status from 'Confirmed' to: 'Resolved'

Campbell Barton closed this issue

2020-06-19 10:54:15 +02:00

Campbell Barton commented

2020-06-19 10:54:15 +02:00

This has been done, see: bf2a54b058

This has been done, see: bf2a54b058

Sign in to join this conversation.

No Label

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Adding "minimal Python Interpreter" for Drivers #47823