2008-03-24

How I want to change how import works

For those of you who attended PyCon 2008, you may have heard my lightning talk where I ranted about how I don't like various approaches to import. I have now had time to think things through and I have a more solid plan on how I want to change things in a (mostly) backwards-compatible fashion.

First off, I want to provide a cleaner API for people to use directly. __import__() has it's call signature as it is for simplicity of bytecode. But what is good for bytecode is not necessarily good for a human being. So I want to introduce ``import_module(name:str, level:int, caller_name:str, caller_is_pkg:bool, top:bool)``. The arguments represent:
  • name: Same as for __import__().
  • level: Same as for __import__().
  • caller_name: the __name__ value of the module performing the import.
  • caller_is_pkg: true if the caller is a package (i.e., defines __path__).
  • top: true if the top-level module is desired instead of the tail module.
Thus __import__() can call import_module( ) as ``import_module(name, level, globals['__name__'], '__path__' in globals, bool(fromlist)``. This will allow __import__() to do what it needs to do to handle the fromlist for the bytecode but still have a nicer call signature for those of you who want to dynamically import modules.

Next, and the most backwards-incompatible change I want, is to use sys.meta_path exclusively for holding importers (which incidentally probably should have been named "finders" since importers don't import anything).  No more implicit built-in/frozen/extension/source importer, just was is on sys.meta_path. The problem with that is people may have been clearing sys.meta_path for some reason in the past or blindly resetting it. With this change that would cause problems since it would wipe out all the default importers.

This would also do away with the need for sys.path_importer_cache and sys.path_hooks. Both attributes are more implementation details and thus should be pushed on to importers and not imprt. By considering sys.meta_path the only place to look for importers you can then just treat sys.path as a fallback list of locations to look on when find_modules() is called with 'path' set to None. This would break backwards-compatibility unless an importer on sys.meta_path is added that uses sys.path_hooks and friends as needed (which shouldn't be hard and would make having the proper DeprecationWarning easy).

Lastly, in terms of sys.meta_path, I would want separate importers for extension modules and source code. This makes it much more explicit which is considered more important as well as simplifying implementations and allowing for more control over imports. The problem is that this is backwards-incompatible. Currently the search for a module on sys.path places its location on sys.path as the highest priority, followed by whether it is an extension module or source code; this change would flip those priorities. In practice this shouldn't be an issue since having two modules on sys.path with the same is just plain bad to begin with and thus makes a module's location on sys.path not critical in terms of this.

With import itself now simplified, I want to also make it easier to customize the loading of source code and bytecode. To do that I want to add two more importer protocols. The first is to introduce`` source_mtime(module:str) -> int`` and ``read_bytecode(module:str) -> bytes``. These two methods along with get_source() covers all the details needed to figure out whether bytecode or source code should be used for a specific module and to get at what is needed to create the proper code object. This allows for a generic function to handle all of the critical steps for creating a code object from source, letting loaders only have to worry about deciding how to get the source, bytecode and last modification of source. It also allows for easier swapping out of the default handler of source and bytecode when someone wants to run a transformation of the source or bytecode.

The other protocol I want is ``write_bytecode(module:str, bytecode:bytes) -> bool``. This allows for a loader to store back any newly generated bytecode. This is perfect for people who want to suppress bytecode generation or have them written to a specific location. It also makes it easier for loaders to just have the support to store bytecode if they only start off with source. The zipfile importer, for instance, could use source but then write back bytecode when possible to make things simpler.

After that the stuff I want is all implementation specific (e.g., a decorator for load_module() that handle getting the proper module to use since that is just boilerplate that every loader has to implement). I don't think there is much here that is hugely controversial. And I think it is all for the better as it makes the import algorithm simpler along with making it all more explicit and easier to customize.

But I wouldn't expect any of this to get into 3.0. This will very likely be a 3.1 thing. But for any of you reading this that currently play with sys.meta_path and always assume it is empty by default; I would stop doing that if I were you. =)