2009-09-29

Adding structural consistency to exceptions

When I wrote PEP 352: Required Superclass for Exceptions one of the hopes I had was it would lead to some sanity when it came to attaching information to exceptions. As it stands now everything is stored under BaseException.args which is not exactly structured, and so I wanted to clean that situation up. Unfortunately this got derailed because of backwards-compatibility issues (and for those of you being bitten by the BaseException.message deprecation, Python 2.6.1 has a fix to make it saner).

But today a tweet from Jacob Kaplan-Moss reminded me why exceptions need to be cleaned up. If you look at the constructor for BaseException it is essentially:
def __init__(self, *args):
  self.args = args

Not exactly fancy, but this loose generality has a price in that most people simply toss in all potentially useful information into the exception constructor without providing a way to get any information out of it in a reasonable way. For instance, if you have arguments 2 and 3 contain useful information why do I have to know what index provides what info instead of using a descriptive attribute name? If I can't use dir() on an exception to figure out what useful metadata is on an exception then there is a problem. There's a reason named tuples came into existence; indexes are not self-documenting.

There is also a nastier side-effect for exceptions given multiple arguments in regards to how BaseException.__str__() acts. If args has exactly a single value then the string for the exception is str(args[0]). But if args has multiple values then the string of the exception is str(args). That can lead people not tacking on any information to make sure their exceptions have a nice, clean string representation. That's what leads people like JKM having to parse data out of an exception's string. It's just ludicrous for anyone to have to parse an exception to get information that was obviously available when the message was created!

I see two solutions to this predicament, and both involve changing the constructor to BaseException. Let's look at each in turn and use IndexError as an example of how things might improve where we attach to the exception what index was out of what range. So one option is to simply change BaseException to only accept a single argument and have that be bound to message and be what the string representation is:
class BaseException:

    def __init__(self, message=''):
        self.message = message

    def __str__(self):
        return str(self.message)

That would motivate me to change IndexError to:
class IndexError(Exception):

    def __init__(self, message="index out of range", *, index=None, range=None):
        if index is not None:
            message = "index {index} out of range"
            if range is not None:
                message += " {range}, exclusive"
        super().__init__(message.format(index=index, range=range))
        self.index = index
        self.range = range

By having BaseException accept only a single argument people who typically just toss metadata about why the exception is being raised are forced to actually construct a message. The hope is that if someone is being force to construct a message for an exception on their own they will simply tack the information on to the instance.

But what about those exception authors who are fine creating a message but still don't bother to tack the metadata on to the exception? That brings up the other possible approach to BaseException where it does string interpolation for you based on keyword arguments you pass in:
class BaseException:

    def __init__(self, message, **kwargs):
        self.message = message.format(**kwargs)
        self.__dict__.update(kwargs)

    def __str__(self):
        return str(self.message)

With IndexError we can now have a couple of options. One is to simply subclass and hope that most people will simply do IndexError("index {index} out of range", index=42) in all instances. The other option is to take the approach shown above but cut out some code that is no longer needed:
class IndexError(Exception):

    def __init__(self, message="index is out of range", *, index=None, range=None, **kwargs):
        if index is not None:
            message = "index {index} out of range"
            if range is not None:
                message += " {range}, exclusive"
        super().__init__(message, index=index, range=range, **kwargs)

This approach has the nice effect of promoting people to simply let BaseException construct the message string for the user while providing the nice side-effect of also storing the data on the exception using descriptive attribute names. It also allows for easy arbitrary metadata through kwargs more than the previous approach where you would have to explicit take kwargs and then update the instance.

Regardless of the approach, the real trick is how would you transition over to it? First thing would be to introduce a pending deprecation for BaseException taking more than a single argument. Next would be to activate the new semantics with keyword arguments where if more than a single positional argument is given an exception is thrown. After a certain amount of time

But the real trick for a transition is message. Do you only set it for exceptions that have transitioned? If you do set it for exceptions that are still passing in multiple positional arguments do you set the attribute to the first argument or the string representation for all of them? Or do you simply skip having message and just let people call str() on exceptions to get what the message would be, like so?
class BaseException:

    def __init__(self, message, **kwargs):
        self.__message = message
        self.__dict__.update(kwargs)

    def __str__(self):
        return self.__message.format(**self.__dict__)

I honestly don't know what the best solution would be. Some people would probably complain about not being able to introspect on the message attribute, but exposing a string format seems somewhat icky to me. Either way some solution would be available.

Who knows, maybe some day I will try to push this into Python and finish what I had originally intended to do with PEP 352.

5 comments:

chphilli said...

+1 from me! Trying to realistically handle exceptions is one of the few problems I still have when using third-party code in Python.

Why not provide the message attribute as the formatted string? Or maybe as a property that returns the string representation of the instance?

Vinay Sajip said...

In the example where BaseException takes kwargs and formats them into self.message, then why does __str__ have to return str(self.message) rather than just self.message?

Walter said...

I don't think that formatting the message in __init__() is the right approach. Formatting the message might be time consuming and the message might never be used, because the exception is caught and discarded. IMHO it's better to do the formatting in __str__().

srittau said...

Thanks for bringing this to attention. I think forcing people to think about their Exception classes instead of just using the ad-hoc stuffing would increase the quality of stdlib and other external libraries.

Brett said...

@chphili The decorator would say the problem. But if you are doing that then you might as well simply just tell people to call str() on the exceptions.

@vinay Duck tying says there is nothing wrong with what you pass in for 'message' has to be a string, just an object that has a format() method that accepts **kwargs. But __str__() is ALWAYS supposed to return a string, so I make sure that happens. Plus it's more conceptually true; the str for the exception is the str of 'message', not just 'message' itself.

@walter If this actually got implemented the lazy approach would probably be taken; that's why I listed that approach at the end.

Post a Comment