Coder Who Says Py: unit testing

Showing posts with label unit testing. Show all posts

2007-12-16

What is the best way to test both a Python and C version of a module?

If I had my way, everything in the stdlib that didn't need to be an extension module would be implemented in Python. That would make maintenance so much easier. Plus it would make things better for the various alternative implementations of Python.

But this kind of restriction ain't about to happen (especially with Raymond Hettinger around as he is a speed freak =). In some cases we actually have two implementations of a module; one in Python, one in C (see heapq and bisect for examples). This usually occurs because the module was originally written in Python but someone really wanted a version in C for performance reasons. This actually makes sense to me; write in Python and if you actually find in real life that you need faster, re-implement in C while keeping the Python version around. Chances are if there is a bug it will be in the C version (see the compatibility issues with StringIO/cStringIO and pickle/cPickle), so maintenance might suck by having two versions, but usually the person who goes far enough to implement a C part keeps it minimal so it isn't too bad.

But in these cases where a module in the stdlib has a Python version that imports a C version underneath the covers when available, both versions of the code need to be tested. But how can you do that in a simple fashion? I have an idea ...

Let's look at the basic idiom in play here. Typically there is a Python module that contains all the code. Then, somewhere towards the bottom, the C version is import with a ``from .. import ..`` statement which then overwrites the Python functions and such with the C versions. I am going to assume here that no one uses the delegate pattern and thus only imports the C module and then references off the module.

OK, let's assume you have run your unit tests once with whatever the module gives you by default on your system. Assuming that is the C version, we now want only the Python code to be tested. First, we need to trigger the module we are testing to be reloaded without the C code. That happens by storing the C version locally and then setting a value in sys.modules for the C module that will fail on a ``from .. import ..`` call. Technically this should not be 'None' since that has special meaning when it comes to packages (leads to a redirect in the import); I am partial to 42. The storing of the C module is important as extension modules are typically not designed to be reloaded, so you should assume they can't be reloaded.

With a guaranteed import failure of the Python code, we can now delete the Python module from sys.modules and execute our import again. The Python code should not end up importing the C code since the import fails. If you are doing this in a function, you need to make sure the names that you re-import are specified as global to get them rebound in the global namespace or else your re-import will just be local and your test module will still have a reference to the first import that uses the C code (and it will still run fine thanks to reference counting).

Finally, once the tests are run again you should delete the module you are testing one more time from sys.modules so you can use the C version again.

Here is some example code::


from monty.python import spam

class TestClass: pass

def test_main():
run_tests(TestClass)
from sys import modules
if '_spam' in modules:
  global spam
  c = modules['monty.python._spam']
  modules['monty.python._spam'] = 42
  del modules['monty.python.spam']
  from monty.python import spam
  run_tests(TestClass)
  modules['monty.python._spam'] = c
  del modules['monty.python.spam']

There are some issues with this approach, though. First, assumptions are being made that the module is designed to be reloaded. If it does a bunch of global caching in the module then there will be discrepancies between code that imported the module previous and code that imports the module from this point forward. Second, there is a lot of brittle code there that should not be messed with. One could write a context manager and that would give you::


from monty.python import spam

class TestClass: pass

def test_main():
run_tests(TestClass)
with hide_c('monty.python.spam', 'monty.python._spam'):
  global spam
  from monty.python import spam
  run_tests(TestClass)

Better, but not perfect. It would be nice to ditch the explicit import. You could have run_tests() take the same arguments as hide_c(). With that you could introspect on the test classes to find out which test modules are being run. Then you could inspect their global namespace to find instances of the module being tested and directly substitute in any re-imported module. But at that point you are implementing an extended reload() that does a search for references to the old module and does a direct substitution.

Probably the biggest issue with either approach, though, is subclasses. If you wrote a mock object that subclassed the thing that it was stubbing out for instance/subclass checks, you would still be using the C code version of the module. You would have to explicitly clobber the mock objects and the module that produced them to make sure you didn't have any more issues.

Unfortunately there is no fool-proof solution. This might come down to a good-enough solution in the end.

2007-12-14

GHOP is working out well (and I need to make dev'ing on Python easier)

The PSF work in the Google Highly Open Participation project seems to be going well! I personally have seen two tests get rewritten from old-style stdout comparison to doctest with another three being actively worked on.

But if there is one thing this experience has taught me is that a doc on how to run the tests and write tests for Python is badly needed. There has been a consistent misunderstanding over why the tasks to ditch the Lib/test/output tests for doctest are desired. There has also been questions about how to best go about using doctest and such. And since so many of the entry-level things people can work on are test-related it would be really helpful to have a doc that people can read to understand how everything is structured.

All of this has reminded me about my mid-term plans for my Python work. I am not sure if I have stated this publicly even though I have been thinking it for a while, but I am planning to really focus my dev time in the future to helping to make Python easier to hack on and maintain. Once my current "cutting edge" Python commitments are complete (importlib and the Py3K stdlib reorg) I am shifting my focus to stuff that is not as time critical and more specifically focused on helping to make developing Python easier.

I think first I will write up some docs. Beyond just the testing doc I want some basic stuff on how to report and review a bug report, how to create and review a patch, and how to get commit privs and do an actual commit. There should probably also be a doc on how to get your environment set up and an intro to the directory structure. I am hoping to keep these docs at a high enough level that I don't get bogged down in detail but have enough info so that if we have people show up at a sprint or a bug day they can just read those docs and know what is needed when one takes on a bug report or patch.

With hopefully more people helping it will probably be close to time to re-evaluate the workflow we have for issues. The current one was inherited from SourceForge and it just doesn't work well for our needs. Now that we control our own issue tracker I want to take the opportunity to make it easier to manage issues and help make it easy to stay on top of things so we don't end up with a backlog of open issues.

Next I want to clean up the testing framework. As it stands it really sucks having all of the tests in a single directory. They should at least be divided into language, built-ins, and stdlib. Might also toss in a prerequisite package for those tests that have to pass in order for the test suite to even be considered in a sane state are kept in a single location.

I then want to do something that the PyPy folk wanted ages go. Tests should specify if they are blackbox or whitebox tests. It sucks that we have tests that are specific to CPython when various implementations of Python use the unit tests as a way to verify their implementation. It should be easy for them to know what tests they don't need to implement. If, when the time comes, the PyPy, Jython, and/or IronPython communities are willing to help me mark up the test suite then I am willing to write the code to deal with this.

To go along with this, it should be easy to specify tests that are platform-dependent within the test itself. This goes for the module level as well as at least the class level, if not the method level.

Each test should also have a way to specify what module they are testing (so that failed imports of support code shows up as a test failure instead of a skipped test). This will also help to make sure all modules have a test in the first place.

And finally (in terms of changing code in tests), every test should have a standardized function to call. Not every function has a test_main() function, but it should. Might also be good to have it take an optional verbosity argument or something instead of reading off an attribute on test.test_support.

I want regrtest.py to be greatly simplified from what it is now. It really should only facilitate in executing tests, not help judge if a test passes (e.g., output comparison tests) or whether a test should be run (e.g., whether a specific test is platform-specific or not). Only stuff that is not unique to any one test should be in regrtest.py. If the above implemented that should be doable.

With the tests all organized we will need to know how effective the tests are. Getting code coverage reported daily would be nice. I know Walter Dörwald has his coverage report, but I want something on trunk that is run nightly and is more visible. I want it so that if someone feels like contributing to Python but don't have a personal itch to scratch they can either look for a bug or patch to review or work to improve the code coverage on some module.

With all of this in place it is probably time to look at the build system. I hate autoconf and don't love GNU Make. I would love to replace them with another build system that is a lot easier to maintain.

With the maintenance of what we have taken care of, it should then be time to simplifying what we do have to maintain. The thing that specifically comes to mind on this topic is the conversion from CST to AST by the compiler. Right now it is handled by hand-coded C. What I would rather see is a Python-implemented domain-specific library (not language!) handle the mapping of grammar to AST that generates C code. Basically I want to minimize the amount C code in Python to only stuff that can't be auto-generated by Python code.

Obviously this is a lot of stuff to do and it will take years to complete. But all of this stuff has been on my todo list for ages so I suspect I will actually do a lot of them. I really want to make contributing to Python as easy as possible. Not just so that I have to do less grunt work and thus get to do more nutty things (e.g., trying to develop new bytecode), but also so other people can feel the same satisfaction I do from helping with the development of Python.

2007-12-12

Testing is important, even in statically typed languages

I was planning a blog post about how much I hate the debug, fix, build cycle of statically typed langauges, especially when you have a lengthy compile-time (AspectJ really makes you pay a high cost for this when you change a deep-touching pointcut).

But then I had stuff stop working for me because of the hacky nature that my supervisor and I have been adding stuff. And with no tests, I am not sure when certain things just ceased to work. This whole thing came together so quickly and was under such a state of flux I never took the time to learn how to do proper unit testing for Java/AspectJ (I am assuming junit is still the best bet; if you know of something better please leave a comment) since I have not seriously coded in Java in years.

Anyway, Titus had a post about how some GNOME developers said they didn't always like programming in Python because "you have to be very diligent about unit tests and code coverage". But as another post at another blog pointed out, you really should be doing this level of testing regardless of whether your compiler checks stuff. I completely agree and I am paying the price right now for not doing what I knew I should be doing.

I have argued before that the amount of time required to add tests specifically needed because of the use of a dynamic language is less compared to the amount of time you lose to compilation and what little help you get from type safety. I honestly think type safety should be viewed more as an opportunity for optimization and less as protection against errors.

But you still have the time loss to compilation if you do testing for a statically typed language. But the piece of mind and making sure you don't introduce regressions is just worth it.

2007-01-11

Classifying unit tests

A while back the PyPy folk said they would love to have support for classifying what tests in the stdlib were implementation-specific. That would allow them to run the test suite without having to keep their own list of tests that they can never pass. Don't know if they are still interested in this, though.

Having written some new context managers for test.test_support and read about Collin Winter's rework of unittest, I am inspired enough to at least blog on the topic.

What I see is test.test_support.classify being a decorator. Based on keyword arguments one could specify what operating systems the test is expected to work on specifically along with whether it is implementation-specific:


@classify(os=["win", "darwin"], implementation=True)
def test_foo(): pass

That would probably set a __testing__ attribute on the method with a dict containing specific keys and values or a set with very specific values. Accessor methods could be provided, but considering only test runners should care about this info it probably isn't really needed.

For doing an entire class there are two options. One is to use a metaclass whose constructor takes the same kind of arguments and then applies them to every method. The other is to provide a function that returns what should be set for __testing__ on the class. Either one would work although I think unittest.TestCase is still a classic class.

The only other thing I have been planning on doing for unit tests is changing how ImportError is handled by regrtest. As of right now if any form of import failure occurs then a test is skipped saying a certain module was not found. But what should really happen is that the test is skipped if the module to be tested is not found. Any other import failure should be an error.

To handle that one needs a function that takes in a module name to import and returns that module. Obviously some support would be need for not only for ``import ...`` imports but ``from ... import ...``. But it is very straight-forward to write a function to do that; trick is the interface.