2009-01-17

importlib is now in Python 3.1

[update: fixed a typo and a broken link]

Back in the summer of 2006 I interned at Google under Neal Norwitz. Part of what I did that summer was try to figure out how to potentially secure the Python interpreter for embedding into Firefox. I did finally figure out how to secure the interpreter for embedding to protect resources, culminating in a paper for a security course I took at UBC.

Part of the solution I developed required controlling import such that you couldn't import arbitrary built-in, frozen, and extension modules. As import is currently implemented, that is simply not possible to do in a secure fashion. That meant reworking or rewriting import. Since the import code was known to be a little difficult to work with I decided to rewrite it in pure Python.

Work began on October 4, 2006. At the time I was planning on making my Python security work my thesis topic with the long-term goal of making my rewrite of import the official implementation of import. Little did I know how massive of a project this would turn out to be.

Two years, three months, and 13 days later, importlib came into being for Python 3.1 in revision 68698. Between the beginning and now my security work stopped being my thesis topic (as did anything directly relating to Python), I dropped support for Python 2.x, and I learned part of the reason the C implementation is so difficult to work with is that import's semantics are rather nuanced and require juggling a lot in your head at once. Oh, and allowing different source encodings is evil.

This is easily the longest amount of time I have ever spent on a single piece of code. One of the surprising things is that the thing is not even 2,000 LOC, including tests! It just took forever to get the semantics fully backwards-compatible (short of some assumptions in the tests, you can currently run the entire test suite for Python with importlib as __import__ and have things work).

The other thing that held up checking importlib in was being too much of a perfectionist. I think I implemented importlib twice, and I still have plans on how to clean things up. Importlib has become the perfect example that your initial implementation might work, but it most definitely will not be the best implementation you can do. Heck, I still have some things to change to make the code easier to work with and more useful to users.

The perfectionist part also came out through worrying about the public API. I know people want to have access to all of the code I have written for their own importers. So I have constantly worried about how to expose it in a sane way. But API design is hard, especially when it is in Python's standard library. Get something wrong and you have to live with it for at least one extra release when you add a deprecation. This is why I am going to expose the API slowly over time and probably blog about it so that I can get feedback from people.

Now that the code is in, what are the long term plans? Well, I have notes with the code that cover what I plan to do. They start with documenting importlib.import_module. That is to be the function that everyone has asked for: a usable interface over __import__. As it stands now the interface is ``import_module(name, package)`` where 'name' is what to import, including relative imports, and 'package' is the package for the calling module. Calling the function returns the specified module, not the top module like __import__; no more fake values in fromlist! I might change the argument names, but I can't think of any other way to make it the API simpler and straight-forward.

Past that is cleaning up some things through better refactorings, exposing more code, and then exposing more of the code. But the end goal is still to get this all bootstrapped into Python 3.1 so that importlib becomes the actual implementation of __import__.