This past term I audited a grad course on computer security here at UBC in the EECE dept. As part of the course there was a final paper. Having already spent months on the topic of security and Python I used the course as an excuse to write up my work.
I have put the paper, "Controlling Access to Resources Within The Python Interpreter" online. There are some things you should keep in mind when reading the paper. One is that this paper was for a course, not a journal or conference. Another is that I had an eight page limit so I didn't have space to go into how the security implementation would defend against common attacks, etc. Lastly, the audience did not know Python, so there is some stuff in the paper that is probably rather basic for anyone who reads this blog.
I think the paper turned out fine. Comments on the security design are welcome. Edit issues are not too critical as this paper would most likely get reworked for a conference if it ever comes to this.
Assuming I have the time I am hoping to use this paper as a reference in a PEP to get the changes I want into Py3K. But I am not sure if I am going to have the time to pull this off by April 30th, especially if I want a proof-of-concept ready by then. So if it slips to Python 3.1/2.7 then that is life.
And a special thanks needs to go to my supervisor, Eric Wohlstadter, for funding me while doing this work. Even when it seemed we might not get a publication out of this he allowed me to continue to work on it which I really appreciate.
2007-04-19
Subscribe to:
Post Comments (Atom)
17 comments:
Hi,
very interesting paper you have. I have some questions about it, that I hope you could answer and discuss.
Import seems to allow an attempted access to the file system. By that I mean directories are opened, and files are read. Even files are tried to be parsed.
Is it possible to pass in function/class imported from outside? How will that function/class be able to be made safe before being passed in? Is this a good idea?
Is it possible to allow only importing from a given path? That way only the directory(and optionally sub directories) of that module could be searched for code to run. This could allow an attacker to gain a little bit of information about a system, or to access a file system. Even if it is only in a limited fashion.
I assume that because sys.path is not able to be changed, that it would not allow any directory to be scanned for python modules? eg adding /etc and trying to import some .py/.so files in there.
Importing of python code seems to be a bit dangerous in itself, given that it is not a provably correct thing to parse the python language? What do you think about bugs in the python parser as an attack vector?
Have you looked into how hard it would be to validate python byte code? Is it known to be easily crashable? Or is there some other reason why python byte code is not a good place to validate? I would think it would be easier to create a safe byte code parser/validator than a python code parser?
I don't think python in C can be that safe - just because of bugs. Witness all of the crash bugs over the years in the bug tracker. What do you think of this problem?
I think OS level protection is probably a good idea to be used in conjunction with any language level protections. Maybe python on java, or .NET would be a bit better?
Did you consider how code signing for python would work?
Did you consider cpu, and memory usage protections?
Cheers,
re: Illume
Segfaulting the Python parser is possible in theory but rarely in practice - even in development branches. The parser emits a very small subset of all possible byte code combinations and only those combinations are tested. That is why bytecode is much more dangerous than plain source code.
A byte code validator could work but you would have to run it each time byte code was imported. While that might be cheaper in CPU spent than just re-parsing the original source it would be a lot more work to implement it and it might not even be CPU-quicker in the end.
I can't speak for Brett but I do understand why memory and CPU limits aren't in there. Those are OS tasks and implementing them in the runtime would be extremely expensive in terms of performance (if it was even possible). Limiting IO is a much easier task and good enough for many applications.
I've never seen the point of code signing. When users are presented with a yes/no/cancel dialoge they just click "yes."
Thanks for explaining that Jack.
I haven't finished reading your paper yet, but I have been giving this sandbox issues some thought lately. Here are a couple questions for you if you don't mind:
1. What do you think of the security of using this code to make a sandbox?
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496746
2. Even with what you're suggesting, how can you prevent a user submitting a huge number to multiply or exponentiate and locking up the interpreter? e.g., 82173821737213782173821739921**881230980921832173821732132323798321
(I don't believe Python will let such a task be interuptted.)
While the old "users will approve/deny code" model of code signing is probably a non-starter, I think that there is another application of code-signing that should be considered; enabling module/application creators to control dependencies. If I create module X that imports Y and Z it makes it easier for me to make security assumptions if I can know that the Y and Z I am relying upon are actually the versions I request. If you call something that you import then a trust-boundary is being crossed, and this seems to be one area where code-signing might provide some benefit.
illume:
Yes, you could pass in any object you want since you could just stick it in the built-in namespace. Whether that is wise or not depends on the object. But the general answer is "no" unless you know exactly what references that object exposes.
For restricting to a specific path, you just edit sys.path. With sys not whitelisted then you control exactly where on the file system the interpreter looks. I don't know why you think you cannot edit it. It's just a list in a dict at the C level. Just modify it in Py_Initialize() before you start executing code.
I am not worried about crashing the parser. Never seen that happen. I guess it's possible.
I have not looked at validating Python bytecode. It honestly just does not interest me. =) But the interpreter is not easy to crash. Usually have to use some really odd semantics to cause it. But there are some known ways.
Don't quite understand your question about Python in C. If you mean the overall interpreter, of course it will have bugs. But I don't think we are any more buggy than other applications. We just happen to have an open bug tracker.
Jython might provide a better security solution, but I was not after a Jython solution. I wanted a Python language solution that worked in CPython. There really is nothing special about the CPython solution short of the delegate solution.
Code signing just says you trust where the code comes from. That's fine, but that doesn't secure you from code you don't trust. Code signing has its place, but it doesn't solve all security problems.
I am not considering memory or CPU protection. My work has nothing to do about DoS attacks. I wrote a proof-of-concept memory tracking version of the interpreter last summer. It always leaked a couple of bytes each time you pressed Enter at the interpreter prompt, but it mostly worked. But it was a pain. I view memory tracking and such a separate project.
gregory:
The link you pointed to was chopped off (you can use anchor tags in Blogger comments so if you want to repost go ahead). I suspect you were going to link to the AST checker recipe. I did not consider it as that solution in the Cookbook does blacklisting which is really bad. All I have to do is introduce a new node that is a security hazard and I just broke the AST solution.
Phil Hassey of Galcon fame decided to turn that recipe into a whitelist solution based on my suggestion. I still don't love the solution, but theoretically it should work. I just worry about code getting past the checker and then doing something legit that gets them access to something somewhere.
As for protecting against large multiplies and other attacks on CPU and memory, I am not worrying about DoS attacks. Someone else can deal with that. =)
Brett:
Here's the link . I think you knew which one I meant though.
You said "All I have to do is introduce a new node that is a security hazard and I just broke the AST solution." How does one introduce a new node?
gregory:
Yep, that is the one I was thinking of.
As for adding a node, see PEP 339. It requires editing C code. My point is not about some random program adding a node, but python-dev adding a node and the user of the AST checker not knowing that python-dev added a node.
I finished reading the paper, good stuff.
Two questions:
1. I'm wondering why you remove execfile but not exec, or eval? Are they not dangerous too?
2. Is your creation something I can run now, or is there still a lot of work left to be done?
gregory:
1) execfile is removed because it accesses files, not because it dynamically executes code. eval and compile are fine as if they couldn't be trusted neither could any Python code that is imported.
2) If you go to the bcannon-objcap branch in Python's svn repository you can get teh latest code. I just started a proof-of-concept app in the directory today. Next step is to integrate in whitelisting. After that is remove the dangerous built-in functions. And then finally test. You can see the BRANCH_NOTES file in the branch to see what is left to be done.
Nice things. If i read it correctly your not trying to prevent DOS attacks like this modification to Safe-Tcl does.
http://www.tcl.tk/cgi-bin/tct/tip/143.html
I see that you can do all the usual things by writing delegates, but will there be a basic library of delegates for common tasks? (much like the security policies e.g. safesock in Safe-Tcl Base?).
schlenk:
If someone writes such a library, then yes. =) As of right now I am just trying to get the implementation done, so I am not worrying about any library support at the moment.
What's the final goal in your work i.e., How do you envision it working to the end user?
Will a user use it simularly to how rexec used to be used, OR are you proposing creating a whole new "safe" interpreter?
I expect to have a C function for people who embed Python that takes what modules to whitelist and then gets the interpreter set up for them to use.
I am not proposing any rexec solution. I am just providing the needed tech (as long as my built-in changes get in) that allows basically anyone to tweak some objects in an interpreter. It isn't a "safe" interpreter in terms of a new one; it's just a standard interpreter with some things set in a specific way.
So let's say I have a regular Python app, and I want a secure way people can provide their own scripting for it. With your method does that mean I'll need to launch a seperate instance of the interpreter each time I want to run a user script?
It seems like a rexec approach would be better since it can run in the same interpreter. (I could still be misunderstanding your approach though)
Gregory:
I am not trying to replace rexec (the paper flat-out says that). Nor am I trying to come up with a way to secure Python code in a pure Python app.
And coming up with a rexec solution is much harder than what I did. I am not even going to touch that one. You could possibly build off of my work, but it would be a lot of work.
Post a Comment