2006-08-19

Thoughts on possible new testing frameworks

I am a fan of testing. While I don't necessarily write my tests before I start coding, I do always write a full test suite of my initial version of my code as soon as hands-on testing leads me to believe that my code basically runs. My problem, though, is that I don't love unittest.

My first issue is that it doesn't feel Pythonic to me. You can tell its design came from another language (originally Smalltalk, but popularized by Java). For instance, the requirement that your tests be methods in a class that inherits from a specific class just doesn't feel natural to me. I just want some test functions sometimes.

Python supports functions and a procedural style of programming where you do not need to be as heavy-handed as full-blown object-oriented programming. This heaviness is another complaint I have with unittest.

But for all of the heaviness, some basic support for common uses is totally lacking. Really simple checking of return values for a specific function call with arguments is not there short of a full method that calls another method. And there is no support for trying a bunch of different inputs and the expected outputs. Or how about mocking out things? Stuff that everyone has to do constantly is just not there.

So, what to do? Well, one option is to just live with unittest as-is and just add some support functions. Granted, py.test has some of the stuff I am after, but that doesn't let me reinvent the wheel. =)

I have two ideas on how to approach writing unit tests in Python. One is a more declarative fashion, and another uses descriptors. But what exactly does a test need?

A test needs a name for identifying which tests fails. A description of what the test is supposed to be doing is helpful. You need to set up the environment (which includes setting things up like files and databases, mocking out stuff). You need to gather any posssible arguments. You obviously need to make the call of what is being tested (if testing a function). Check the result to make sure the test passed or failed. You should clean up the environment you created. And finally report your results which includes the needed debugging information. That is pretty much the life of a unit test.

Now, how do you do this declaratively? You can have a test constructor where you specify with keywords the test parts. Here is a rough example that makes sure int() returns the proper number for a string representing a number is passed in:

test(name="str to int()",
description="Make sure a str representing a number works for the constructor.",
call=int,
args=('10',),
kwargs={'base':10},
expect=10,
message="expected ${expect}, not ${returned}")

I don't think it would be hard to make it so that this style could support an expected exception to be raised, nor an iterator that passes in triples of (args, kwargs, expected_result) for testing a large number of inputs. The biggest trick I see is how to handle setting up an environment (ala setUp()/tearDown() in unittest). One way would be to have a generator accepted that needs to return the (args, kwargs, expected_result) triple so that the generator would set up the environment, yield the needed arguments, and then when the generator was closed it would have a 'finally' statement called to clean up.

Now some might say that the declarative style above is itself too heavy. Personally, it might take more to type, but it seems very readable to me. It is also very extensible since you can easily have it take an 'exception' argument to specify an expected exception, a function to handle the comparison of the result, whether the test is OS or implementation-specific, etc.

But another approach is to shift over to using decorators. Why can't we shift some things into decorators to make certain things stand out more?:

@expect(10, "expected ${expect}, not ${returned}")
def test_str_to_int():
"""Make sure a str representing a number works for the constructor."""
return int('10', base=10)
While this style might not be quite as declarative and put quite as much semantic information into the test itself (e.g., what is being called and the arguments are separate), it might be more in a style that people like.

This style can also be easily extended as well. Specifying what operating systems the test works on or whether it is implementation-specific can just be other decorators. You can also use the idea of a generator returning environment values that can be passed in as an argument to the test function as a single argument and then within the test function the argument can be pulled apart as needed to be used.

Regardless of whether anyone even likes either of these ideas, decorators will be added to Python's testing framework (probably to test_support) for classifying tests. With there currently being no way to specify what tests should only run on what operating systems, which are implementation-specific (and thus can be ignored by PyPy, IronPython, or Jython), what tests are crashers, or what tests are known to be failing, it makes running the whole test suite a little less ideal.

There will also be the addition of a function to handle the import of the module being tested. This allows a test to be properly skipped if the needed module is not available. And if another import fails, then the whole testing framework will be considered broken and raise the proper error (this need came out of real-world experience in the core).

Lastly, testing the C API with ctypes is sorely needed. The testing of the C API is minimal at best and really needs to have a proper testing framework in place to make sure that all levels of Python are properly tested.