2011-01-31

PSF core grant, day 19: more fun with imp.reload()

[From now on I will try to include a tl;dr line in the title of posts so people get a better idea of what the post is about, but I am not dropping the day count since it helps me make sure I don't fall behind in work]

Side-effects upon import are such a pain to deal with. Now this is not me speaking with my "import expert" cap on (although from that perspective I still think they suck, especially when people are crazy enough to spawn threads as a side-effect of importation, but that's another blog post), but as someone who has to deal with stdlib code.

There are various modules which like to cache stuff at the global level of a module. Good examples are copyreg and warnings. Others like to execute code, like abc. And then you have tests which like to do direct class comparisons to verify the proper class is returned by some API call. All of this breaks if you do module reloading as assumptions about the state of the module quickly go out the door. This sucks because there are no clear delineation as to which modules will collapse under reloading and which will be fine without manually looking at code to verify whether there is global code executed as part of an import (beyond obviously function and class definitions).

Now this could be solved in a couple of ways. Probably the most obvious is to have some specific module that exists to specifically to hold global data. Modules that have global data that they need to resist reloads would go there. A simple hasattr() check would then be able to determine if the module had already been loaded and thus didn't need to have the initialization of its global data. This also has the nice side-effect of centralizing global data for Python.

But an even easier solution is to simply test if the global holding the data is set:

try: state
except NameError: state = {}
else: state += 1

That will actually increment the state variable on every reload. This is because reloading a module is actually simply re-executing a module's globals over the module's original __dict__. So you could protect all of that global stuff in the 'except' clause so that it isn't re-initialized on reloads.

But doing any this will break backwards-compatibility because you know someone, somewhere is relying on reloading some module to reset it. Joys of working on the stdlib. But maybe people will want to be able to reload modules safely more?

The other option is for coverage.py to whitelist what modules can be reloaded when run with --pylib. All other modules already imported could then be flagged as unknown/unmeasured (which itself would require adding a new type to coverage.py), especially if only global-level code could be flagged as unmeasured and all local statements stayed flagged as not covered (and thus not tested).

At this point I consider my approach of blacklisting modules not to reload a failure. There are just too many modules that just do not care if their global state is completely reset upon reload (which perhaps may be a good thing since if you are reloading to pick up new code you would not want a cache of objects using an old class definition lying around).