2009-09-29

Adding structural consistency to exceptions

When I wrote PEP 352: Required Superclass for Exceptions one of the hopes I had was it would lead to some sanity when it came to attaching information to exceptions. As it stands now everything is stored under BaseException.args which is not exactly structured, and so I wanted to clean that situation up. Unfortunately this got derailed because of backwards-compatibility issues (and for those of you being bitten by the BaseException.message deprecation, Python 2.6.1 has a fix to make it saner).

But today a tweet from Jacob Kaplan-Moss reminded me why exceptions need to be cleaned up. If you look at the constructor for BaseException it is essentially:
def __init__(self, *args):
  self.args = args

Not exactly fancy, but this loose generality has a price in that most people simply toss in all potentially useful information into the exception constructor without providing a way to get any information out of it in a reasonable way. For instance, if you have arguments 2 and 3 contain useful information why do I have to know what index provides what info instead of using a descriptive attribute name? If I can't use dir() on an exception to figure out what useful metadata is on an exception then there is a problem. There's a reason named tuples came into existence; indexes are not self-documenting.

There is also a nastier side-effect for exceptions given multiple arguments in regards to how BaseException.__str__() acts. If args has exactly a single value then the string for the exception is str(args[0]). But if args has multiple values then the string of the exception is str(args). That can lead people not tacking on any information to make sure their exceptions have a nice, clean string representation. That's what leads people like JKM having to parse data out of an exception's string. It's just ludicrous for anyone to have to parse an exception to get information that was obviously available when the message was created!

I see two solutions to this predicament, and both involve changing the constructor to BaseException. Let's look at each in turn and use IndexError as an example of how things might improve where we attach to the exception what index was out of what range. So one option is to simply change BaseException to only accept a single argument and have that be bound to message and be what the string representation is:
class BaseException:

    def __init__(self, message=''):
        self.message = message

    def __str__(self):
        return str(self.message)

That would motivate me to change IndexError to:
class IndexError(Exception):

    def __init__(self, message="index out of range", *, index=None, range=None):
        if index is not None:
            message = "index {index} out of range"
            if range is not None:
                message += " {range}, exclusive"
        super().__init__(message.format(index=index, range=range))
        self.index = index
        self.range = range

By having BaseException accept only a single argument people who typically just toss metadata about why the exception is being raised are forced to actually construct a message. The hope is that if someone is being force to construct a message for an exception on their own they will simply tack the information on to the instance.

But what about those exception authors who are fine creating a message but still don't bother to tack the metadata on to the exception? That brings up the other possible approach to BaseException where it does string interpolation for you based on keyword arguments you pass in:
class BaseException:

    def __init__(self, message, **kwargs):
        self.message = message.format(**kwargs)
        self.__dict__.update(kwargs)

    def __str__(self):
        return str(self.message)

With IndexError we can now have a couple of options. One is to simply subclass and hope that most people will simply do IndexError("index {index} out of range", index=42) in all instances. The other option is to take the approach shown above but cut out some code that is no longer needed:
class IndexError(Exception):

    def __init__(self, message="index is out of range", *, index=None, range=None, **kwargs):
        if index is not None:
            message = "index {index} out of range"
            if range is not None:
                message += " {range}, exclusive"
        super().__init__(message, index=index, range=range, **kwargs)

This approach has the nice effect of promoting people to simply let BaseException construct the message string for the user while providing the nice side-effect of also storing the data on the exception using descriptive attribute names. It also allows for easy arbitrary metadata through kwargs more than the previous approach where you would have to explicit take kwargs and then update the instance.

Regardless of the approach, the real trick is how would you transition over to it? First thing would be to introduce a pending deprecation for BaseException taking more than a single argument. Next would be to activate the new semantics with keyword arguments where if more than a single positional argument is given an exception is thrown. After a certain amount of time

But the real trick for a transition is message. Do you only set it for exceptions that have transitioned? If you do set it for exceptions that are still passing in multiple positional arguments do you set the attribute to the first argument or the string representation for all of them? Or do you simply skip having message and just let people call str() on exceptions to get what the message would be, like so?
class BaseException:

    def __init__(self, message, **kwargs):
        self.__message = message
        self.__dict__.update(kwargs)

    def __str__(self):
        return self.__message.format(**self.__dict__)

I honestly don't know what the best solution would be. Some people would probably complain about not being able to introspect on the message attribute, but exposing a string format seems somewhat icky to me. Either way some solution would be available.

Who knows, maybe some day I will try to push this into Python and finish what I had originally intended to do with PEP 352.