2011-03-23

importlib: doing it, and doing it, and doing it well

Earlier today I pushed a change to CPython which allows importlib to pass Python's test suite as the implementation of __import__ (sans failures because nothing expects __loader__ to be universally set on modules). This means that there is no known compatibility issues standing between me and making importlib CPython's implementation of __import__.

So what was the final hurdle? It basically boils down to code objects and their immutability. When you import from bytecode you are essentially loading a code object which represents the module (basically using marshal.loads()). The issue is that code objects embed what file they were created from (the co_filename attribute). But what happens if you relocate a .pyc file? Should co_filename point to the place the file was originally created at or at its current location? Turns out Python thinks it should be the latter.

The issue with updating the co_filename attribute on a code object is that it can only be done by C code; attributes on a code object are immutable from Python. My original plan was to modify marshal.loads() to take a filename argument which represented where the marshal data came from. That way marshal could fix co_filename while keeping the attribute immutable. But this didn't work.

It turns out that the attribute used by __import__ for fixing co_filename is not as thorough as the recursive solution I came up with for marshal.loads(). There is a loop that looks for objects to fix, but as soon as one of them is accurate the search is stopped by __import__. Rather than replicate this somewhat odd solution for marshal, I simply exposed a private API in imp for me to use in order to mutate a code object's co_filename attribute. Not an elegant solution, but since I am taking a PyPy view to import (i.e., it's my job to make importlib as compatible as possible and not tweak __import__ to fit my needs) I didn't have much of a choice.

Regardless, it feels very satisfying to have an implementation of import that passes the test suite now. This means I can work towards bootstrapping importlib as the implementation of import. I am planning to go with some bytecode-freezing solution for CPython where there is a build rule which recreates the bytecode any time importlib._bootstrap is mutated. That will remove the worry of having to rely upon some external file for import to work (otherwise I could come up with some import shim that did enough to import importlib directly). And with performance of importlib already being acceptable I feel fairly confident that this bootstrapping will happen in Python 3.3 (I have started work in the bootstrap_importlib branch of my personal repo).