2009-05-30

The staleness of the standard library and adding new things

If you happen to follow certain members of python-dev on Twitter you may have noticed a discussion going on over Zed Shaw's "Curing Python's Neglect" post about inconsistencies in the language and the standard library, focusing mostly on the latter. This caused some people to say Zed should be lindberg'd (link is currently down) for complaining about the state of Python's standard library without offering solutions. This then led to Zed responding about how there is a double-standard for accepting changes between Python committers and everyone else which has led to crap making it into the standard library.

There seems to be a misunderstanding here that I don't think is isolated to Zed (and thus this post is not meant to be picking specifically on Zed, he just happened to spark the discussion) involving Python's history and how things operate over on python-dev that I would like to clear up.

I want to start off by acknowleding the fact that the standard library is far from perfect. There are modules in there that are of subpar quality when compared to other parts of the standard library. And even those that are in there and are considered good can have some inconsistencies in their APIs that can make them enfuriating to use on occasion. There is work that could stand to happen to clean up some things.

Today, to get something added to the standard library, we ask that the code be considered best-of-breed by the community, that the developer promise to maintain the code, and that the maintenance happen within Python's development process and not outside of it. There can be a PEP involved depending on how large the code is and how much of a fight there is to get the code included. And of course this is all dependent on whether the core developers believe the module has enough widespread usefulness to warrant adding it to the standard library.

But historically these requirements are a recent turn of events. I would say that these requirements for inclusion into the standard library did not start to be seriously enforced until some time during the Python 2.5 development cycle which started in November 2004 and ended in September 2006. If you look at the standard library a lot of it predates 2.5 by several releases. If you look at the modules that Zed said could use some improvements you will notice that none of them were added to the standard library recently (or even in the standard library as Zed mistakenly thought that setuptools and easy_install were):
  • os: 1992 (I don't even know what release that is)
  • time: 1990 (less than a year after Python's creation!)
  • datetime: 2.3
  • email: 2.2
My point is that if you look at a module in the standard library and it seems a little stale and that the API could have possibly used some more public vetting then it probably could have but was accepted anyway. People need to realize that Python was started by Guido to scratch an itch and just happened to turn into this massively popular programming language. Python's popularity took off starting shortly after the first PyCon in 2003, IMO. But because the growth was organic python-dev did not realize how much more careful we had to be with standard library inclusions until a few years later. Back before we instituted more stringent acceptance requirements things were added simply if people offered up the code, were willing to maintain it, and python-dev thought it would be useful.

But admittedly, even today some modules get to short-circuit the acceptance process. Importlib is a perfect example of this as I didn't have to put it out there for a year to make sure people thought it was best-of-breed. I didn't have to write up a PEP to vet the API. The only thing I had going for me was a known need for what importlib provides. Otherwise the code went in because I wanted it to go in and I happen to be in a position where I can make that simply happen.

But I skipped the steps only because of other things I did. I skipped the year-long acceptance by the community as I was targetting Python 3.0 which was not released as final yet. I also blogged extensively about importlib and its API so it was at least discussed publicly somewhere. There were several core developers I talked with about the API and who also watched every commit I made with interest and provided feedback. I talked with people at PyCon about this stuff. There is a reason I spent years working on importlib.

But importlib is an exception, not the rule. If my bootstrapping goal did not exist importlib would have existed externally for at least a year before I tried to bring it into the standard library. And if I had gotten any realistic pushback from python-dev I would have waited for inclusion, but thankfully I didn't receive any. And trust me when I say that people on python-dev are quite happy to share their opinion if they think something is a bad idea. While people who have proven themselves do get to skip some steps, don't think that they get to do whatever they want to the standard library.

With all of this in mind, how should modules get fixed? If an API is truly deemed poor and in need of a replacement it can either grow a new API next to the old one or we introduce a new module to make the old module obsolete. Doing the former is nice for stupid little API mistakes, but does require working properly with the pre-existing module which might be a hinderance.

Adding a new module breaks cleanly from the old module, allowing a new API to exist from scratch. But this has the issue of being a much larger burden for support as there will simply be more code. There are possible stability issues, etc. Even if code has existed out in the public for a year does not guarantee top-notch quality.

And of course this all assumes that an API is actually fundamentally broken. Minor issues can be overlooked in the name of prior knowledge. When we add something to python-dev we are essentially asking every Python developer out there to learn about this new code and consider using it which is quite the mental burden if you have been using the older API for years. And it is also possible that you are in a minority in thinking it is broken.

This is all rather complicated and nuanced. Coming up with a solution that pleases everyone is impossible. But python-dev continues to try to do its best and hopefully that does please most people.

2009-05-08

Month one of my python-dev sabbatical

It now has been roughly a month since I disabled subscriptions for practically every mailing list I belong to. Initially I had to get used to the fact that checking Gmail constantly for new mail was a fruitless exercise. Otherwise it's been nice.

I definitely have more time on my hands. Python-dev has been my hobby for so long I had somewhat forgotten what it was like to just work on a coding project purely for fun and just for myself. Not having to answer to the Python community for the quality of my code or test coverage, not having to document everything, or have to argue for why I did something is rather freeing. I can see why Neal Norwitz is on perpetual sabbatical from python-dev.

But I do miss being plugged in. Having to get my news through Twitter about what is going on still feels odd -- this was made especially true when PEP 383 was accepted and I didn't even know the PEP existed until people tweeted about the acceptance.

And I still think about my long-standing projects. I am thinking about how to handle the Mercurial transition and what workflow we should end up with. I still think about what I might need to implement in C for importlib in order to prevent people from lynching me when I bootstrap it in at the implementation of import. This means that once I am done defending my thesis proposal I will be coming back to python-dev.

But I suspect I won't rejoin every mailing list I disabled delivery for. And I also expect I will be much quicker on telling people to move over to python-ideas along with very liberally using the muting feature in Gmail. I have had a taste of freedom from senseless arguments and whining and I don't plan to go back to it without a fight.

One month down, two more to go.

2009-05-04

Does XHR lead to better testing/abstraction?

I was talking with a friend of mine who is a Ruby programmer who does Rails development and I asked her how best to handle a form submission that was malformed. I am thinking of the situation when I have a URL that accepts a POST from a web page and some argument is in the wrong format or an argument is entirely missing. I wanted some way to signal the POST submission was bad directly from the HTTP status code or something. I mean you should have some clear way to know that something failed along with a message as to why it failed.

But she said it should return a 200 with response page specifying what went wrong. When she said that my TDD sensibilities along with my separation of concerns training recoiled in horror. To test the URL response I have to parse the HTML for error output?!? That means I have to inspect the view to know that an error occurred! And this is simply not testing the view to make sure an error was made visible to the user, this was to test the error was caught, period.

After I recovered from my shock at this suggested practiced and realized most sites worked like this I realized why it offended me so much. Just like everyone else I was taught that MVC was generally the best way to structure GUI applications, and in general I agree with the assessment. In my opinion an application should function regardless of what the GUI iss, making the front-end a separate component of the overall application. One should be able to swap out the "V" from MVC and have things still work. That's just good abstraction in my opinion. That means I should be able to test the core components of an application -- the M and C -- without a GUI. And yet with most web apps we do not get this separation thanks to most forms signaling a failure based on what is displayed in the reply.

With this in mind I began to think about how I could rectify this situation for my web app I was developing. And that's when I realized that using XHR to send a form's data to a URL and use the response to handle errors was, from a testing perspective, the best way to go. It might make the client-side code more complicated as a click in a form would no longer simply be the case of the web browser redirecting to a specific page but require processing a response -- probably JSON -- and handling the reply. But from a testing perspective I would be able to test the submission URL in isolation from the web page, thus separating the V from the M and C. Plus it would give me a REST API upfront and thus not require me to create it later -- if I chose to document the API and make it public.

That's when I said to myself, "OK, so how does everyone else handle signifying an error when it comes to errors in a REST API"? And that's when I found out everyone does it there own little way. About the only thing I found that was consistent is that if everything went well the URLs returned a 200 and when it went to hell they returned a 404. But from there there was no consistency. Only looking at JSON responses, some included the HTTP status code in the error message while others did not. Some had the idea of a class of error and a message while others only had a classification. Some had no answer whatsoever (I'm looking at you XML-RPC), and some went over the top (that would be SOAP).

So I began to wonder about what I would do to signify a failure for a REST call. The first thing I agreed upon was that returning 200/404 was reasonable. It does irk me slightly since it is just like returning 0/1 in C to signal an error, but it works in this situation where one does not have proper exceptions nor other status codes to use to signal different types of errors. I could inspect the returned JSON object for an 'error' attribute to realize an error object was returned, but this works just as well and I would rather dispatch on status code than have to introspect the returned value.

Having chosen a way to signify an error, I then thought about what the JSON reply would require. I thought about how Python signifies errors and almost went down the road of something like exceptions complete with inheritance. But I quickly realized that would require loading JavaScript code just for defined exceptions and I didn't like that for an exposed API; for a REST API one should be able to just read the API docs to make a call and not require including any special JavaScript code. I also realized that this was an API issue, not a programming issue. I was more interested in how Python handled errors when calling functions than how bad syntax was flagged.

That means that I wanted something like TypeError when an argument is missing and ValueError when some input is malformed. That's when I figured why can't I just do something like that? If I had a JSON object have an 'error' attribute that specified the type of error, like "TypeError" or "ValueError", I would have my TypeError/ValueError in JSOn. A 'message' attribute would work as BaseException.args[0]. And then I could tack on arbitrary attributes on this error object for error-specific information such as what argument was missing or what kind of format was expected for a field.

And so all of this is what I plan to do the next time I have any data that needs to be sent to a URL for something that is public-facing and and not a hack (although honestly, isn't everything hack and we just happen to be willing to chance having the public use it?). Yes, it complicates the client-side stuff, but actually the JS code could be made into a library to nicely flag errors, etc. if one standardized on the type of errors that would be returned. Plus it makes unit testing the controller much easier. And it forces you to have a REST API upfront so you don't have to design it later.

Oh, and if anyone says, "what about XForms?", I will say, "get it in all of the major web browsers and then we can talk".