Well, after many months of work I have finally managed to secure the Python interpreter as I laid out in my security paper! The code can be found in the bcannon-objcap branch in Python's svn repository (branched off the trunk).
Since I have finally reached this point I should give an overview of what I set out to do, what I have done, and how you too can secure the Python interpreter so that tangible resources (supposedly) can't get manipulated. If you want more thorough details about how the security mechanism is supposed to work, read the paper. This post is more about the technical details and to see if anyone can find a security hole.
What I Set Out To Do
The original goal of this work was to come up with a way so that you could run Python code in an embedded Python interpreter and not worry about it opening arbitrary sockets or touching any files unless you explicitly allowed it. The hope was to get this working for applications that embed the Python interpreter so that it could be used as a springboard into allowing people to have a custom interpreter to run Python apps in securely.
This work was never meant to be an rexec replacement. While I always kept rexec in the back of my mind to make sure that something I did would not explicitly prevent some rexec solution from coming forward, it was never the end goal.
Nor was it a goal to protect intangible things such as memory or CPU usage. This was supposed to protect stuff like files and sockets; things with a concrete object representation.
Changes to Python
So, what the heck did I end up doing? In terms of Python itself, it was actually very minor. First I removed the constructor for the file type. I never thought that I could hide the file type properly, so I just crippled it so that you had to go through open() to get a initialised file object. I added a module called objcap that includes an initialisation function for allocated file objects if people really feel the need to not use open().
Second I removed the constructor from code. I added a function to objcap to deal with this. I did this as Python does not verify bytecode and so someone might be able to crash the interpreter or something with some crazy bytecode.
With those two types neutered, I turned my attention to protecting imports. With my Python implementation of import, importlib, I already knew I had the control I needed to prevent dangerous imports, but I had to make sure that the fully powered import was not exposed in the interpreter or that attributes were exposed. I wrote a simple delegate in C that I stored at sys.import_delegate that simply called what was stored at sys.import_ . This way no attributes on the callable object were exposed. Putting all in the sys module was in no way required, but it was the simplest solution for me. It could have all easily been implemented externally of the sys module if I felt like putting in the effort. =)
I also had to edit codecs so that it didn't import sys but just the one attribute it needed. That way if you imported codecs you didn't get access to the sys module for free.
The last change to Python itself was how sys was re-imported. With sys being so special it has its module dict stored with the interpreter instance. Also because sys is special there are several places in the codebase that add stuff to the sys module during interpreter initialisation. The problem is that the built-in import machinery (as it is exposed through the imp module and thus affecting importlib) caches built-in modules' dicts and some stuff gets added to sys' dict after the caching. That means if you delete the sys module and re-import it can be in bad shape. So I had to special case re-importing the sys module. This breaks calling reload() on sys, but test_xmlrpc is the only thing that I know that does that and reload() is going away in Python 3.0, so I don't care. =)
And that's it for the changes required within Python itself. It really is not extensive in any way. I also don't see a huge issue with getting the key parts (the file and code changes) into the core as long as what is needed is exposed in a reasonable extension module.
Tweaking the Interpreter
Where most of the work comes in is in tweaking interpreter stuff outside of the core code. If you look at secure_python.c in the bcannon-objcap branch you can see what is required to get a secure version of Python to run in an embedded C application with the above-mentioned changes.
Obviously the first step is to initialize the interpreter. That's easy.
Next step is to set importlib as the import machinery. That takes creating a whitelist of built-in, frozen, and extension modules you want to allow (6, 0, and 19 each, respectively, that I could find would be safe), setting an instance of controlled_importlib.ControlledImport to sys.import_, and setting __import__ to sys.import_delegate. Now all imports go through importlib which makes sure that imports are controlled. I also clear sys.meta_path and sys.path_hooks to make sure no lingering imports are there that would accidentally subvert the whitelisting.
Next sys.modules needs to be cleaned out. Starting up Python leads to a bunch of modules being imported. Most are not critical once the interpreter is up. But a handful are required for Python to work. Those required modules (__builtin__, __main__, encodings, codecs, _codecs) are left in sys.modules. The rest are swept into a dict stored in sys.modules under the ".hidden" key. That keeps the objects alive without letting them be imported. The warnings module also gets imported and then moved as it needs to get cached at the C level.
With that done sys.path_importer_cache gets cleared. I leave sys.path alone so that I can import stuff from the stdlib without issue, but it can obviously be tweaked.
Finally, open(), execfile(), and SystemExit are removed from the built-in namespace. The first two are because they open files indiscriminately. SystemExit goes away because the Python interpreter automatically tears down the interpreter if it propagates all the way up. And by not whitelisting the exceptions module you shouldn't be able to get SystemExit.
And that's that. As you can see a lot of it is externally done to Python thanks to how import statements actually call __import__ and how Python exposes so much as dictionaries.
Building and Testing
As for confirming all of this works, I have some tests in the branch. To build all of this you can run build_secure_py.sh, but this has only been used on OS X. And to run the tests execute run_security_tests.py with a *regular* Python interpreter. Stuff in tests/succeed are expected to work while stuff in tests/fail require the code being tested to be in a try statement.
Wrap-Up
If anyone checks out the code, runs it, and manages to find a way to open a file, socket, create a code object from scratch, or arbitrarily import any extension, frozen, or built-in module, please let me know! Hopefully nobody finds a way and this all holds up. If it does I will begin trying to get what I need into the core so that this work doesn't require a special checkout of Python.
Subscribe to:
Post Comments (Atom)
30 comments:
Awesome - I was just nosing around for something like this the other day, and gave up on a fetal idea. Now, back to the drawing board...
That's great! Now all you need to do is to add a copying GC and it could replace Lua in memory constrained game interface customizations. I'm only kidding, you've done your part, leave the rest to someone else.
That's not to say that you couldn't, but that it's better to finish the degree and let someone who *really wants* that functionality to do it.
Hi Brett, great work! Unfortunately...
a = ['spamspamspam']
while 1:
a = a + a
will likely cause problems for the secured interpreter and in building application specific DSLs.
A copying GC (or some other mechanism) would be nice :-)
@Charles:
As Josiah said, that is someone else's job to do. =) GC is not my area of expertise and as I said in the post, intangibles like memory is not what this work was meant for.
Nice work Brett - this is awesome!
Hi, Brett. I tried to build this (svn checkout; configure; make; ./build_secure_py.sh) and it did produce python.exe and secure_python.exe, but I don't think they work properly. secure_python.exe appears to do nothing and return an exit code of 1 no matter what I do. run_security_tests.py reports failures on all 14 tests. Any tips?
I should have specified: I'm on Mac OS 10.4.9.
@Charles:
I neglected to mention that the bcannon-sandboxing branch is a proof-of-concept memory tracking version of the interpreter. It does not work perfectly, but it does show that the general concept of tracking memory use is possible.
@ping
Try using ``--with-pydebug --disable-toolbox-glue``. Otherwise you did the same steps I did; configure;make; build_secure_py.sh .
@ping
Another thing to look at, Ping, is do you have any special Python scripts that automatically get executed? For instance, I have something in my PYTHONSTARTUP that always fails since importing sys is blocked.
A return value of 1 means that an exception is being raised and propagating up (according to the docs for Py_Main).
Brett,
This is fantastic news! Thanks again for all your hard work! when do you find time to sleep?
@Douglas:
I got to work on this as part of my graduate research work. Plus Google let me contemplate and discuss this stuff during my internship in 2006. So luckily it didn't cut into my sleep time too much. =)
I don't know what I'm doing wrong. I tried reconfiguring with --with-pydebug --disable-toolbox-glue and got a python.exe that prints out refcounts after each statement. ./python.exe produces a working interactive interpreter, but ./secure_python.exe always quits with exit code 1 without appearing to do anything. PYTHONSTARTUP and PYTHONPATH are empty, and if I print out sys.path in the ./python.exe interpreter I only see paths within the bcannon-objcap directory (except for /usr/local/lib/python26.zip, which doesn't exist).
Hi Brett. I've just finished reading your paper ("Controlling Access to Resources Within the Python Interpreter") and have a few comments.
I think you need to be careful with the language you use here. What you've achieved is significant, but you should avoid giving the impression that the result is an object-capability system. The modified Python still has mutable shared state -- you can still reach over and stuff attributes into other modules, right? And the modified Python doesn't provide private namespaces.
I saw that you made an argument (section 5.3) that you can describe your model in terms of capabilities if you consider the interpreter to be a single subject and assume there are no other subjects. But you can't really call a language an "object-oriented language" if you're only allowed one object. So I'd say what you have is a restricted interpreter, not an object-capability language.
Does that make sense?
Also, one other thing about terminology: the paper mentions "immutable shared state" a few times. But the requirement is that there must not be any mutable shared state. That's not the same as the existence of immutable shared state. The two are very different. The paper talks about "creating immutable shared state", but what is needed is to remove all mutable shared state.
Hope this helps!
Cool stuff. I look forward to trying it out.
@ping:
I honestly have no clue what is going on. All I ever do is build the normal interpreter and then run build_secure_py.sh. At this point gdb might be the only way to find out what is going on.
One guess, though, is somehow your version of svn is too old to be able to handle symlinks? Double-check that Lib/importlib.py and Lib/controlled_importlib.py exist and are symlinked to the external checkout of importlib.
And I didn't mean to imply that Python is now an object-capabilities language. More that I took an approach of object-capabilities in restricting the interpreter.
And your point about immutable shared state compared to no mutable shared state is valid.
@ping:
I just did ``make distclean; ./configure --with-pydebug --disable-toolbox-glue; make -s; ./build_secure_py.sh`` and everything worked for me.
About not getting SystemExit. Won't this still work?
>>> SystemExit = [x for x in [x for x in object.__subclasses__() if x.__name__ == 'BaseException'][0].__subclasses__() if x.__name__ == 'SystemExit'][0]
Regards,
Armin
@Armin:
Nope. If you read the paper you will notice I removed object.__subclasses__.
Have you considered setting up a "hack me" machine somewhere with this code running on it, to see if anyone can find security holes?
It was originally mentioned here.
..or am I getting ahead of myself ?
@Greg:
Possibly. Problem is that I don't have the expertise to set it up so that memory and CPU resources are properly protected as well.
Sorry to ask more dumb questions :-( but here a few. Thanks for your patience.
1. How are you protecting against os.listdir and shutil.whatever? Do they just make use of the file functions internally and are thus rendered neutered?
2. Why is object.__subclass__ a problem if you've removed all file/socket access?
Thanks again.
@Gregory:
1. Yes. Since they need to eventually import stuff like the 'posix' module you just block the import of that module.
2. The removal of object.__subclasses__ is not for explicit protection of 'file', but for the general principles being followed from object-capabilities. Leaving that method in would not allow for any form of protection from a class getting out and being used by anyone.
@Brett, per your answer to my shutils question, I guess same goes for os.system, and its friends?
So as far as protecting files and sockets, is it safe to import os? Or do I still need to check any module I whitelist to make sure it doesn't expose os or sys, as in glob.os for example?
@Gregory:
Yes, same goes for them.
As for importing os, you can't because it imports the 'sys' module. The module would need to be tweaked so as to be able to work without 'sys' when possible.
Ok, so importing glob would fail because it imports os which imports sys? Makes sense.
Can I put this into my project right now, or do I need to wait until it gets put into the trunk? You mentioned this one is a "a proof-of-concept memory tracking version of the interpreter", I'd probably want the memory tracking taken out before live use, right?
@Gregory:
You can put in right now as long as you patch your copy of the Python source with the changes I have made.
And the bcannon-objcap branch is not the memory-tracking proof-of-concept. That is in the bcannon-sandboxing branch.
I checked out the code and did sudo ./configure, sudo make, and then sudo build_secure_py.sh.
But build_secure_py.sh seems to hit an error. the last line is:
libpython2.6.a(complexobject.o): In function `complex_abs':Objects/complexobject.c:577: undefined reference to `hypot'
collect2: ld returned 1 exit status
I can post all of its output, and/or the output of configure, and make, if that would help.
BTW, I'm on Ubuntu Linux 2.6.15-28-686 if that matters.
@Gregory:
Linker is not finding your glibc (hypot is from math.h). You can easily just edit build_secure_py.sh; it does nothing special except as a quick-and-dirty shell script to build the .o file for Python and then compile secure_python.c linked against the .o file.
You can email me directly (brett at python.org) if you still need help.
Post a Comment