2008-12-06

Why explicit type checking is (typically) considered bad in Python

After my last blog post where I pointed out that "if you really feel the need to type-check arguments to a function" you could take the example code from my pep362 package, someone emailed me asking why the Python community in general seems to think type checking is bad. Since the email quickly became long and somewhat detailed, I figured I would do a blog post instead so others can benefit/suffer as well from my ranting.

To put your thinking in the proper perspective you need to realize that in Python you typically only care about structural/duck typing, not nominative/inheritance-based typing. Since Python does not use static typing you don't need to inherit directly from an object in order for either the object you have been given or the object you expect to work interchangeably. In other words, don't think like a Java/C++ programmer when programming in Python; think "duck typing" when you hear the general term "typing".

With that being said, there are two ways to deal with typing in Python when you care: LBYL (Look Before You Leap) and EAFP (Easier to Ask Forgiveness than Permission).

LBYL is when you check upfront whether an objects meets your typing needs. The perk of LBYL is you know early on that something will be incorrect to use. Failing early has the usual benefits of preventing any side-effects from occurring later on that you would normally have to undo if you just got some AttributeError or something later on. You also know where the requirement for a specific interface comes from instead of noticing that some call you make way down in your code failed and not realizing why the object didn't have the interface that was expected.

But LBYL has one general drawback and another drawback for each of the typical ways you perform an interface check. The general drawback is that any type checking you do has a performance cost. As Python (and through extension Python code) assumes you are not a moron, you shouldn't assume that you are going to be given a badly typed object by default.

As for the two different type checks you can do, they each has a weakness. When you use isinstance() pre-2.6 (more on why 2.6 is special later) you are being more restrictive than you need to be as isinstance() performs nominative type checking. For instance, if you want an object that acts like a file (e.g. a "file-like" object) and you instead get an instance of StringIO.StringIO (I am purposefully using pre-2.6 stuff here as this is not such an issue with the new io module and io.StringIO), your isinstance() check is going to fail. Now you could expand your check to be ``isinstance(ob, (file, StringIO.StringIO))``, but that is brittle as what happens if someone else implements a file-like object? Your check will fail against that new object needlessly.

As for using hasattr() to check for explicit attributes, that's fine as long as you realize you should only check for exactly what you want and you keep the check up-to-date. If you suddenly start using some new attribute you have to make sure you update the checks accordingly.

So LBYL is a pain to maintain. Instead you can use EAFP. With this approach you first use an object as if it implemented the right interface, and if it triggers an exception you do what you can to deal with it. For instance, if you accept either strings or file-like objects, you can try to use the object as a file first. If that triggers an AttributeError because of the lack of a read() method, you can then catch the exception, pass the object to io.StringIO, and then try again.

But what if you don't expect multiple types for the same argument? EAFP then relies on the fact that you documented what was expected and that a traceback will make it clear that the failure was because you passed in the wrong type. People coming from a background where they feel a compiler is their equivalent of a warm blanket have a REALLY hard time buying this argument. But speaking from experience (as does the entire Python community), it actually works out fine and that your warm blanket of a compiler then becomes something that makes you lazy and not want to get out of bed instead of being productive.

Having said all of that, 2.6 and ABCs do change things. With this new feature you can make isinstance() do structural type checking instead of nominative type checking thanks to the fact that isinstance() and issubclass() can now be overloaded. By defining the interface you care about as an ABC and doing the needed registering of pre-existing types, you can have isinstance() do structural type checking and help alleviate the drawback for LBYL when using isinstance(). Do realize, though, it still requires the proper registering of types for the ABC being checked for and there is a performance hit. The LBYL crowd is definitely better off now thanks to ABCs.

So with ABCs now in the language, what is the best approach? Personally, I think EAFP is still the best with your documentation referring to specific ABCs or exactly what attributes are needed. That way you can skip the performance penalty but still switch to LBYL later if you truly feel the need. Since Python, IMO, treats developers as intelligent individuals and then so should your code and assume they will do the right thing in terms of what is passed into your code.