2010-05-16

Maintaining backwards-compatibility while transitioning users

PEP 3147 (PYC Repository Directories) are causing me grief. When I created the ABCs for importlib I did not expect import to change significantly. But here I am, in a situation where the handling of bytecode files has shifted significantly, making the abstractions I created for importlib not work quite as well as they did when Python 3.1 came out. That means something needs to change, but that's not easy when it comes to Python's standard library.

It is said that the standard library is where code goes to die, developmentally speaking. Once your code lands in the stdlib, it's hard to change. You end up not knowing who is using your code or how. But being in the stdlib guarantees that at least one person is somewhere in the world. And if you change something without providing an upgrade path, you are going to have some pissed users. Now, you could argue that as long as you work for three releases (e.g. 2.5 through 2.7), you're safe as that is roughly 4.5 years of compatibility (ask the CentOS/RHEL people and they would say four versions thanks to their 2.4 dependency, but that is a bit nuts as we are not even doing security releases at this point for 2.4). But that still requires that you provide some way to be compatible through three versions of Python. How the heck should I make it so that people using the ABCs from Python 3.1 still work unmodified -- albeit with warnings being triggered -- in Python 3.4?

First, let's look at what caused the problem. When I created the ABCs I assumed that there were primarily two situations: you only cared about source or you cared about source and bytecode. The thinking was that if you wanted to do some wonky source transformation or you were working with a VM that didn't use CPython bytecode files (i.e. all other VMs), then you would want to use the PyLoader that only used source. Otherwise you used the PyPycLoader for source & bytecode usage.

PEP 3147 makes the split more along the lines of source or source-less importing. The PEP leads to Python emphasizing more than it did that bytecode files are really an optimization that happen to have been used by some as a distribution mechanism (there was actually a bit of back-and-forth about removing bytecode-only/source-less import support, but enough people used this to continue supporting it unfortunately). That means that my division in the ABCs was no longer in alignment with how things will be viewed by importer authors going forward.

The improper division is made even more acute by the fact that the PyPycLoader exposes bytecode details that do not extrapolate into a new PEP 3147 world. For instance, PyPycLoader.bytecode_path() in Python 3.1 returns the path to the bytecode file next to the source file. But with PEP 3147 you have the cached file path when there is source code along with the path to the bytecode file in a source-less import. The bytecode_path() method now must branch its logic based on whether there is any source code. That's really unfortunate for anyone who implemented their own bytecode_path() as it will no longer operate properly in the face of PEP 3147 as how everyone else expects it to work. So in this case I exposed too much to the user of the ABC.

I also messed up by introducing new methods that took module names instead of "file" paths. Everyone assumes that you are working on a file system or something that resembles one, so adding the overhead of having all of the methods have to resolve a module name to a path name to perform their operations -- e.g. PyPycLoader.source_mtime() 99% of the time will need to stat a file path, not some abstracted thing where you can index by module name alone. That's annoying as it leads to a lot of boilerplate in all of the methods I introduced beyond PEP 302.

Not using file paths is also unfortunate when it comes to using a loader as a file system abstraction. Theoretically, if you have data files with your package code, you should be able to use __loader__.get_data() to read it and not have to worry about any other details such as if you are reading from a zip file or an actual file system. Had I used file paths for things such as source_mtime() then people would have gained modification time info on files they stored with their package code.

So how do I fix my initial design mistakes while also fixing issue that PEP 3147 have brought up, all while letting importers written for Python 3.1 to continue to function? First, I am going to deprecate PyPycLoader and introduce PycLoader. Doing this will let me keep PyLoader to handle source imports, but make using cached bytecode files an optimization detail that I do not have to directly expose to the loader author. And adding PycLoader allows me to still support source-less loaders without cluttering the code for PyLoader (I might actually not even add PycLoader as I really don't like source-less imports and do not want to promote them by making it easier to use them beyond what is necessary). I could simply deprecate both PyLoader and PyPycLoader and introduce new ABCs named SourceLoader and BytecodeLoader, but that leaves people who have already written importers in a bad position of having to maintain two class definitions which would be really unfortunate.

For PyLoader, get_filename() will fully supplant source_path(). This was already the plan as it makes a loader more compatible with runpy, but this just makes it that much easier to have the support inherent in the design of the ABC. That's very minor as get_filename() can be written to not be abstract and to call source_path() as it does now. But what I can do is make that usage trigger a deprecation warning (all warnings I mention here will start off as a PendingDeprecationWarning and then shift to a DeprecationWarning, and then finally be removed). For people who want to easily support both Python 3.1 usage where source_path() was the thing to use and Python 3.2 and newer, all they need to do is define get_filename() and alias source_path() to it in the class definition. That will override the warning-raising version of get_filename() in Python 3.2 and newer but still provide source_path() for Python 3.1.

To support cached bytecode files as introduced in PEP 3147, an optional method will be introduced: path_mtime(). This will take a path and return the modification time of the path, raising IOError if the path does not exist (just like ResourceLoader.get_data()). In order to support Python 3.1 loaders, if path_mtime() is not defined then source_mtime() will be used and a warning will be raised. To support both 3.1 and 3.2 loaders, you can simply write your own source_mtime() that calls get_filename() to get the path to the source code and then call path_mtime(), returning None instead of letting the IOError propagate (returning None for source_path() and bytecode_path() was a bad move on my part; I prefer EAFP over LBYL).

A similar approach to path_mtime()/source_mtime() would be used for adding set_data() to replace write_bytecode(). The reason for naming the new method set_data() is for naming consistency with get_data(), even though I would rather name it write_data(). And once again, moving to file paths over module names adds to the file system abstraction.

In all of this, bytecode_path() goes away. Why? Because it's not needed thanks to imp.source_to_cache(). As I said earlier, bytecode files are viewed more as optimizations than ever before, meaning that Python should control that optimization and not the user for the sake of consistency and any future changes to that optimization. It also allows other VMs to signal that bytecode is not supported by simply having imp.source_to_cache() return None, making PyLoader work on any Python VM properly out-of-the-box.

This does means that with PyPycLoader being replaced by PyLoader with some optional methods leads to Python 3.1 PyLoader implementations not having bytecode support. It's unfortunate, but should be a minor loss. But if one really needs that bytecode support, you can conditionally choose which ABC to use as a base class and make sure you implement all of the needed methods to work in Python 3.1.

Overall I think it's a reasonable transition plan. Backwards-compatibility is kept from Python 3.1 until I choose to rip out support for the old way without having to do major contorting. Plus it helps future-proof the ABCs such that they won't have to go through this transition period again.