2007-05-30

I have finished securing the Python interpreter!

Well, after many months of work I have finally managed to secure the Python interpreter as I laid out in my security paper! The code can be found in the bcannon-objcap branch in Python's svn repository (branched off the trunk).

Since I have finally reached this point I should give an overview of what I set out to do, what I have done, and how you too can secure the Python interpreter so that tangible resources (supposedly) can't get manipulated. If you want more thorough details about how the security mechanism is supposed to work, read the paper. This post is more about the technical details and to see if anyone can find a security hole.

What I Set Out To Do

The original goal of this work was to come up with a way so that you could run Python code in an embedded Python interpreter and not worry about it opening arbitrary sockets or touching any files unless you explicitly allowed it. The hope was to get this working for applications that embed the Python interpreter so that it could be used as a springboard into allowing people to have a custom interpreter to run Python apps in securely.

This work was never meant to be an rexec replacement. While I always kept rexec in the back of my mind to make sure that something I did would not explicitly prevent some rexec solution from coming forward, it was never the end goal.

Nor was it a goal to protect intangible things such as memory or CPU usage. This was supposed to protect stuff like files and sockets; things with a concrete object representation.

Changes to Python

So, what the heck did I end up doing? In terms of Python itself, it was actually very minor. First I removed the constructor for the file type. I never thought that I could hide the file type properly, so I just crippled it so that you had to go through open() to get a initialised file object. I added a module called objcap that includes an initialisation function for allocated file objects if people really feel the need to not use open().

Second I removed the constructor from code. I added a function to objcap to deal with this. I did this as Python does not verify bytecode and so someone might be able to crash the interpreter or something with some crazy bytecode.

With those two types neutered, I turned my attention to protecting imports. With my Python implementation of import, importlib, I already knew I had the control I needed to prevent dangerous imports, but I had to make sure that the fully powered import was not exposed in the interpreter or that attributes were exposed. I wrote a simple delegate in C that I stored at sys.import_delegate that simply called what was stored at sys.import_ . This way no attributes on the callable object were exposed. Putting all in the sys module was in no way required, but it was the simplest solution for me. It could have all easily been implemented externally of the sys module if I felt like putting in the effort. =)

I also had to edit codecs so that it didn't import sys but just the one attribute it needed. That way if you imported codecs you didn't get access to the sys module for free.

The last change to Python itself was how sys was re-imported. With sys being so special it has its module dict stored with the interpreter instance. Also because sys is special there are several places in the codebase that add stuff to the sys module during interpreter initialisation. The problem is that the built-in import machinery (as it is exposed through the imp module and thus affecting importlib) caches built-in modules' dicts and some stuff gets added to sys' dict after the caching. That means if you delete the sys module and re-import it can be in bad shape. So I had to special case re-importing the sys module. This breaks calling reload() on sys, but test_xmlrpc is the only thing that I know that does that and reload() is going away in Python 3.0, so I don't care. =)

And that's it for the changes required within Python itself. It really is not extensive in any way. I also don't see a huge issue with getting the key parts (the file and code changes) into the core as long as what is needed is exposed in a reasonable extension module.

Tweaking the Interpreter

Where most of the work comes in is in tweaking interpreter stuff outside of the core code. If you look at secure_python.c in the bcannon-objcap branch you can see what is required to get a secure version of Python to run in an embedded C application with the above-mentioned changes.

Obviously the first step is to initialize the interpreter. That's easy.

Next step is to set importlib as the import machinery. That takes creating a whitelist of built-in, frozen, and extension modules you want to allow (6, 0, and 19 each, respectively, that I could find would be safe), setting an instance of controlled_importlib.ControlledImport to sys.import_, and setting __import__ to sys.import_delegate. Now all imports go through importlib which makes sure that imports are controlled. I also clear sys.meta_path and sys.path_hooks to make sure no lingering imports are there that would accidentally subvert the whitelisting.

Next sys.modules needs to be cleaned out. Starting up Python leads to a bunch of modules being imported. Most are not critical once the interpreter is up. But a handful are required for Python to work. Those required modules (__builtin__, __main__, encodings, codecs, _codecs) are left in sys.modules. The rest are swept into a dict stored in sys.modules under the ".hidden" key. That keeps the objects alive without letting them be imported. The warnings module also gets imported and then moved as it needs to get cached at the C level.

With that done sys.path_importer_cache gets cleared. I leave sys.path alone so that I can import stuff from the stdlib without issue, but it can obviously be tweaked.

Finally, open(), execfile(), and SystemExit are removed from the built-in namespace. The first two are because they open files indiscriminately. SystemExit goes away because the Python interpreter automatically tears down the interpreter if it propagates all the way up. And by not whitelisting the exceptions module you shouldn't be able to get SystemExit.

And that's that. As you can see a lot of it is externally done to Python thanks to how import statements actually call __import__ and how Python exposes so much as dictionaries.

Building and Testing

As for confirming all of this works, I have some tests in the branch. To build all of this you can run build_secure_py.sh, but this has only been used on OS X. And to run the tests execute run_security_tests.py with a *regular* Python interpreter. Stuff in tests/succeed are expected to work while stuff in tests/fail require the code being tested to be in a try statement.

Wrap-Up

If anyone checks out the code, runs it, and manages to find a way to open a file, socket, create a code object from scratch, or arbitrarily import any extension, frozen, or built-in module, please let me know! Hopefully nobody finds a way and this all holds up. If it does I will begin trying to get what I need into the core so that this work doesn't require a special checkout of Python.