Lazy imports in 25 lines of code

For some unknown reason to me, I found myself thinking about lazy importing on my way home from school. Now I have never actually dived into the topic before so I had no preconceived notions of how to solve the problem. I do know that people like to have imports be lazy so that the startup time of their applications are not dominated by imports of modules that they do not need immediately, if at all. Many projects have replicated their own solution to this problem such as bzr and hg. This has even gone as far as Greg Ewing proposing actual syntax support on python-ideas and Christian Heimes using lazy imports as a motivating factor for post import hooks.

While the solution varies, the general agreed upon solution is to call some custom function that will return a module proxy. That proxy triggers the actual import once the module is actually used in some meaningful way, such as when an attempt is made to access an attribute. I also assumed that people either used a custom importer (meta path or not) or a custom __import__ function. But beyond that I had no clue how people solved the issue. The importer solution can be seen in a cookbook recipe and the custom function can be seen in peak.util.import (source).

Going in blind on this, I started to think about what it would take to pull this off using importlib in Python 3.1 for Python source. I was not going to worry about built-in modules and the like as importing those types of modules are much cheaper compared to importing Python source or bytecode. And I wanted to solution to be as simple as possible since I was doing this for fun, not as an actual robust solution to a problem I had.

With those goals in mind, I realized that I needed to come up with a lazy loader that could be subclassed for ease of use. I also knew I needed to come up with module proxy object that would trigger an import the instant an attempt was made to access an attribute. Seems simple enough, but obviously it isn't obvious.

First problem I had to solve was how the module proxy would get at the real loader when it was eventually needed. That was quickly solved by assigning the lazy loader to __loader__ and treating the class as a mix-in. Assuming the lazy loader class name was LazyMixin and I set up the inheritance correctly, I could get at the real loader from within the module proxy with super(LazyMixin, self.__loader__). By re-assigning the resulting object to __loader__ I would continue to have the actual loader doing the loading on the module as well have a consistent way to get at the loader I was after. Assuming a module proxy class named LazyModule, the code looks like this:

class LazyMixin:

def load_module(self, name):
module = LazyModule(name)
module.__loader__ = self
sys.modules[name] = module
return module

With that problem solved I turned my attention to the module proxy. After some mishaps with trying to dynamically set __getattr__ I had an epiphany: I could simply reset __class__ to a class that lacked the __getattribute__. By having both the lazy module class with __getattribute__ defined and the module class that __class__ gets set to inherit from types.ModuleType I could make sure that the module instance was always consistent in terms of its type. This all led to the following module objects:

class Module(types.ModuleType):

class LazyModule(types.ModuleType):

def __getattribute__(self, attr):
self.__class__ = Module
self.__loader__ = super(LazyMixin, self.__loader__)
return getattr(self, attr)

The final problem to work out was how to make sure that the module instance was what got initialized by the import so that all pre-existing references to the module instance worked after the import. The obvious way is to make the module proxy act as an actual proxy; you could store the initialized module in the proxy and forward all attribute requests to the initialized module using __getattribute__. But that has a performance penalty that seems unnecessary and I had a gut feeling there had to be a more elegant solution.

That more elegant solution is module reloading. If you look at PEP 302 you will notice that to support module reloading that a loader is expected to reuse the module found in sys.modules when performing a load. This makes sure that existing references to a module continue to work as the module instance doesn't change, just its __dict__. So by sticking the module proxy instance into sys.modules the real loader would use that module object for the initialization, letting all pre-existing references to continue to be valid.

It turns out that all of this was enough for a proof-of-concept to work! By creating a class that inherited both the lazy loader and importlib's private Python source file loader I was able to defer actual importing of a source file until an attribute on the module was accessed. And the best part was that the only requirement of the real loader was that it follow the rules set out in PEP 302 which importlib provides with importlib.util.module_for_loader. And an even cooler thing is that since the lazy loader is a mix-in that only overrides load_module all other methods from the real loader are accessible, meaning the lazy loader allows for the use of introspection APIs such as importlib.abc.InspectLoader to continue to work! And the greatest perk of all is that since it works with loaders it is completely transparent to the import statement, removing any need to use a custom function to make this work!

To see the proof-of-concept which is NOT PRODUCTION QUALITY -- remember it is using private APIs from importlib which could easily disappear at any time -- see this paste bin. I am sure the code should be more robust somehow, but I am rather pleased with the solution is only 20 source lines and appears to work at least in a really dead-simple example. If people actually like this approach and think they would find it useful then leave a comment and I will see if I can be persuaded to package it up and write the proper tests so that I would be willing to put this up on the Cheeseshop.