2009-09-29

Adding structural consistency to exceptions

When I wrote PEP 352: Required Superclass for Exceptions one of the hopes I had was it would lead to some sanity when it came to attaching information to exceptions. As it stands now everything is stored under BaseException.args which is not exactly structured, and so I wanted to clean that situation up. Unfortunately this got derailed because of backwards-compatibility issues (and for those of you being bitten by the BaseException.message deprecation, Python 2.6.1 has a fix to make it saner).

But today a tweet from Jacob Kaplan-Moss reminded me why exceptions need to be cleaned up. If you look at the constructor for BaseException it is essentially:
def __init__(self, *args):
  self.args = args

Not exactly fancy, but this loose generality has a price in that most people simply toss in all potentially useful information into the exception constructor without providing a way to get any information out of it in a reasonable way. For instance, if you have arguments 2 and 3 contain useful information why do I have to know what index provides what info instead of using a descriptive attribute name? If I can't use dir() on an exception to figure out what useful metadata is on an exception then there is a problem. There's a reason named tuples came into existence; indexes are not self-documenting.

There is also a nastier side-effect for exceptions given multiple arguments in regards to how BaseException.__str__() acts. If args has exactly a single value then the string for the exception is str(args[0]). But if args has multiple values then the string of the exception is str(args). That can lead people not tacking on any information to make sure their exceptions have a nice, clean string representation. That's what leads people like JKM having to parse data out of an exception's string. It's just ludicrous for anyone to have to parse an exception to get information that was obviously available when the message was created!

I see two solutions to this predicament, and both involve changing the constructor to BaseException. Let's look at each in turn and use IndexError as an example of how things might improve where we attach to the exception what index was out of what range. So one option is to simply change BaseException to only accept a single argument and have that be bound to message and be what the string representation is:
class BaseException:

    def __init__(self, message=''):
        self.message = message

    def __str__(self):
        return str(self.message)

That would motivate me to change IndexError to:
class IndexError(Exception):

    def __init__(self, message="index out of range", *, index=None, range=None):
        if index is not None:
            message = "index {index} out of range"
            if range is not None:
                message += " {range}, exclusive"
        super().__init__(message.format(index=index, range=range))
        self.index = index
        self.range = range

By having BaseException accept only a single argument people who typically just toss metadata about why the exception is being raised are forced to actually construct a message. The hope is that if someone is being force to construct a message for an exception on their own they will simply tack the information on to the instance.

But what about those exception authors who are fine creating a message but still don't bother to tack the metadata on to the exception? That brings up the other possible approach to BaseException where it does string interpolation for you based on keyword arguments you pass in:
class BaseException:

    def __init__(self, message, **kwargs):
        self.message = message.format(**kwargs)
        self.__dict__.update(kwargs)

    def __str__(self):
        return str(self.message)

With IndexError we can now have a couple of options. One is to simply subclass and hope that most people will simply do IndexError("index {index} out of range", index=42) in all instances. The other option is to take the approach shown above but cut out some code that is no longer needed:
class IndexError(Exception):

    def __init__(self, message="index is out of range", *, index=None, range=None, **kwargs):
        if index is not None:
            message = "index {index} out of range"
            if range is not None:
                message += " {range}, exclusive"
        super().__init__(message, index=index, range=range, **kwargs)

This approach has the nice effect of promoting people to simply let BaseException construct the message string for the user while providing the nice side-effect of also storing the data on the exception using descriptive attribute names. It also allows for easy arbitrary metadata through kwargs more than the previous approach where you would have to explicit take kwargs and then update the instance.

Regardless of the approach, the real trick is how would you transition over to it? First thing would be to introduce a pending deprecation for BaseException taking more than a single argument. Next would be to activate the new semantics with keyword arguments where if more than a single positional argument is given an exception is thrown. After a certain amount of time

But the real trick for a transition is message. Do you only set it for exceptions that have transitioned? If you do set it for exceptions that are still passing in multiple positional arguments do you set the attribute to the first argument or the string representation for all of them? Or do you simply skip having message and just let people call str() on exceptions to get what the message would be, like so?
class BaseException:

    def __init__(self, message, **kwargs):
        self.__message = message
        self.__dict__.update(kwargs)

    def __str__(self):
        return self.__message.format(**self.__dict__)

I honestly don't know what the best solution would be. Some people would probably complain about not being able to introspect on the message attribute, but exposing a string format seems somewhat icky to me. Either way some solution would be available.

Who knows, maybe some day I will try to push this into Python and finish what I had originally intended to do with PEP 352.

2009-09-16

PyCon 2010 talk proposals due in two weeks

Just a reminder to folks out there that the due date for submitting a talk to PyCon 2010 is two weeks away!

2009-09-13

Evolving the standard library

As Titus blogged, an interesting discussion was started over on the stdlib-sig about whether argparse should be added to the standard library, and if so how to handle/whether to deprecate getopt and optparse. Since the discussion showed rather well how people think the standard library should evolve I figured I would blog about those views and my own (assuming I don't mess up and misrepresent people
=).

2009-09-02

Intersection of built-in modules between CPython, Jython and IronPython

[EDIT: updated for IronPython 2.6b2; made it clearer which VMs are missing what modules that importlib relies upon]

It has been a big goal of mine to make importlib the default implementation of import for CPython. But an even bigger goal has been to make it the default implementation for ALL full featured implementations of Python once they implement Python 3. Not only would it make sure that all VMs have consistent semantics when it came to imports, but to also prevent every VM from having to re-implement import themselves.

But using importlib as import imposes a bootstrapping problem. How do you import, well, import? First off, you need to find the source code, compile it into a code object, and create a module object using that code object. That part is actually easy as you can simply look for the file on sys.path since you know what you are looking for, you can compile the source using the built-in compile() function, and then you finally create a module and initialize it with exec(). This is essentially what importlib does at a rudimentary level.

But import obviously goes beyond the rudimentary. There is bytecode to read and write, packages to deal with, warnings to raise, etc. And all of that requires code from some module in the standard library. But if you are trying to bootstrap in import w/o having a full-featured import, what do you do? You rely on built-in modules is what you do.

By using built-in modules you could have the VM inject any built-in module into the created importlib module and have it begin using it. Because of this I was curious as to what built-in modules CPython 3.1, Jython 2.5, and IronPython 2.6b2 had in common. The results are:
  • _codecs
  • _functools
  • _sre
  • _weakref
  • errno
  • gc
  • imp
  • sys
Not a whole lot. Importlib itself relies upon:
errno
Everyone has this.

io
IronPython's _bytesio probably has what I need (importlib only uses io.FileIO). Jython does not cover yet 2.6 so there is hope.

imp
Everyone has this.

marshal
This is actually optional (or at least I will make sure it is) as VMs do not need to implement pyc support.

posix/nt/os2
IronPython has this. Jython plans to have this in 2.6.

sys
Everyone has this.

warnings
Jython does not have a native implementation, but importlib only needs warnings.warn().

There is a partial overlap, but not a complete overlap. Luckily this is for Python 3 and thus there is hope that some of the things I need can be made common between the VMs in terms of what the built-in modules provide. It's possible that IronPython has everything already and Jython could add only what importlib needs (probably) w/o much issue.

Otherwise I am causing myself more pain than I need to and I should just not worry about the bootstrap and simply import code directly. Copying code from the 'os' module does get a little annoying after a while. =)

Less than a month to submit a PyCon 2010 talk

PyCon talk proposals are due October 1, which is less than a month (four weeks) away. I have already submitted a talk on custom importers and using importlib to write your own.