2009-09-13

Evolving the standard library

As Titus blogged, an interesting discussion was started over on the stdlib-sig about whether argparse should be added to the standard library, and if so how to handle/whether to deprecate getopt and optparse. Since the discussion showed rather well how people think the standard library should evolve I figured I would blog about those views and my own (assuming I don't mess up and misrepresent people
=).



Let's begin by examining the "extreme" views of how to evolve the standard library. One view is that it should practically not exist. The thinking is that software evolves, gets replaced, goes stale, etc., so quickly that people should rely more on the Cheeseshop than Python itself for library code. The standard library should come down to only what is required for Python to run, e.g. codecs, or things that are just extremely difficult to do properly on multiple platforms, e.g. os. This prevents people from feeling tied down to what the Python core developer deem worthy of the standard library, allowing for independent evolution of the code. It also alleviates the maintenance headache from the core team so they can focus on other things.

But the drawback is it removes the batteries from Python's "batteries included" slogan (or at least puts it on a severe diet). Obviously people who hold this view think we have overdone the batteries and need to scale it back, especially in light of the Cheeseshop taking off and being the place to get Python packages. And while it is easy to agree that we have modules in the standard library that would not be accepted today, there is still other code there which is very useful to a large number of users.

So people who want a lean standard library would like it stripped down. That would mean a large amount of deprecations in the standard library initially and then practically no additions in the future. The removed modules would then be shifted over to the Cheeseshop where they will bit rot unless someone takes charge to pick them up and maintain them. And I would suspect people who hold this view would be fine with deprecating modules later on in the standard library if it turned out to be a mistake to include them.

The only way I see this view working is if there was built-in support for something like virtualenv. That way any little scripts you code up will work no matter what you happen to have in your site-packages. Currently, if you rely on the standard library you at least know that you will get a pending deprecation warning and then a deprecation warning if you are using something going away. But if you rely on stuff in site-packages that's your responsibility, and let's be honest, we are not always professional and on top of stuff like that, especially when you install a new Python interpreter.

The other extreme is to never remove anything from the standard library. You can take the approach of C where essentially nothing disappears from the standard library, or like Java where you deprecate things but never actually remove code. Either way, once a chunk of code is committed, it is never removed but simply maintained. Taking this approach makes sure that any code you write against the standard library will continue to work no matter what minor version of Python you are using (sans issues with bugs, etc.).

Problem with this is maintenance. For the core developers it means that we must maintain code for decades, even if the module turns out to have been a poor choice for the standard library. It also leads to a "good enough" mentality; why introduce new modules to the standard library that are an incremental improvement to what is already there when it will double the amount of maintenance the core developers must do? It can also be unfortunate for users as this "good enough" mentality leads to the standard library no longer representing the best-of-breed that it should, but simply the good-enough modules that happened to get accepted at some point.

Is there a middle ground? In my opinion there is. If I was made ROFL (Rule Of Fantastic Libraries) I would first prune the standard library down to a more core focus of libraries that require cross-platform expertise, using widely accepted standards, and stuff needed to simply get a script to run. In terms of cross-platform stuff, that would mean os, multiprocess, etc. These are modules that a typical developer would have a hell of a time getting working on all the OSs that CPython runs on. The core developers have gathered a huge amount of skill in making sure stuff can work across platforms well. Plus we are able to disseminate code to a much wider audience to make sure that it is widely tested. Standards are important to be able to process data. And no one wants to have to write or constantly download a command-line options library. And I would also institute quality control requirements for all modules (most likely through test coverage requirements) to make sure the code stays top quality.

Once the standard library is in a good place I would be open to evolving the standard library. I very much believe the standard library should be best-of-breed, even if that means removing modules that are good enough for ones that are better. I would have somewhat lengthy deprecation policies such that modules are sure to be around for three releases (roughly five years) before they are finally removed and retired to the Cheeseshop.

Now one interesting wrinkle with train of thought is Python 3. Here the whole argparse discussion is great example. If we accept argparse and then deprecate optparse and getopt, how does that work out in Python 3? People will be slowly transitioning to Python 3 as the years move on, which means the version of Python 3 that gets a large amount of use might not be Python 3.2 but could be 3.3 or even 3.4. That means if we deprecate optparse we have to make sure people don't suddenly switch to Python 3.4 and have their code broken because some module in the standard library is gone. So either we make a guess as to when most people will transition and make sure optparse is still around in that release or we put argparse into 2.7. I suspect most people prefer the latter, but that is more work for Steven so I think it should be his choice of what he is willing to do.

[This blog post was brought to you by Nine Inch Nails' The Slip]