2009-09-13

Evolving the standard library

As Titus blogged, an interesting discussion was started over on the stdlib-sig about whether argparse should be added to the standard library, and if so how to handle/whether to deprecate getopt and optparse. Since the discussion showed rather well how people think the standard library should evolve I figured I would blog about those views and my own (assuming I don't mess up and misrepresent people
=).



Let's begin by examining the "extreme" views of how to evolve the standard library. One view is that it should practically not exist. The thinking is that software evolves, gets replaced, goes stale, etc., so quickly that people should rely more on the Cheeseshop than Python itself for library code. The standard library should come down to only what is required for Python to run, e.g. codecs, or things that are just extremely difficult to do properly on multiple platforms, e.g. os. This prevents people from feeling tied down to what the Python core developer deem worthy of the standard library, allowing for independent evolution of the code. It also alleviates the maintenance headache from the core team so they can focus on other things.

But the drawback is it removes the batteries from Python's "batteries included" slogan (or at least puts it on a severe diet). Obviously people who hold this view think we have overdone the batteries and need to scale it back, especially in light of the Cheeseshop taking off and being the place to get Python packages. And while it is easy to agree that we have modules in the standard library that would not be accepted today, there is still other code there which is very useful to a large number of users.

So people who want a lean standard library would like it stripped down. That would mean a large amount of deprecations in the standard library initially and then practically no additions in the future. The removed modules would then be shifted over to the Cheeseshop where they will bit rot unless someone takes charge to pick them up and maintain them. And I would suspect people who hold this view would be fine with deprecating modules later on in the standard library if it turned out to be a mistake to include them.

The only way I see this view working is if there was built-in support for something like virtualenv. That way any little scripts you code up will work no matter what you happen to have in your site-packages. Currently, if you rely on the standard library you at least know that you will get a pending deprecation warning and then a deprecation warning if you are using something going away. But if you rely on stuff in site-packages that's your responsibility, and let's be honest, we are not always professional and on top of stuff like that, especially when you install a new Python interpreter.

The other extreme is to never remove anything from the standard library. You can take the approach of C where essentially nothing disappears from the standard library, or like Java where you deprecate things but never actually remove code. Either way, once a chunk of code is committed, it is never removed but simply maintained. Taking this approach makes sure that any code you write against the standard library will continue to work no matter what minor version of Python you are using (sans issues with bugs, etc.).

Problem with this is maintenance. For the core developers it means that we must maintain code for decades, even if the module turns out to have been a poor choice for the standard library. It also leads to a "good enough" mentality; why introduce new modules to the standard library that are an incremental improvement to what is already there when it will double the amount of maintenance the core developers must do? It can also be unfortunate for users as this "good enough" mentality leads to the standard library no longer representing the best-of-breed that it should, but simply the good-enough modules that happened to get accepted at some point.

Is there a middle ground? In my opinion there is. If I was made ROFL (Rule Of Fantastic Libraries) I would first prune the standard library down to a more core focus of libraries that require cross-platform expertise, using widely accepted standards, and stuff needed to simply get a script to run. In terms of cross-platform stuff, that would mean os, multiprocess, etc. These are modules that a typical developer would have a hell of a time getting working on all the OSs that CPython runs on. The core developers have gathered a huge amount of skill in making sure stuff can work across platforms well. Plus we are able to disseminate code to a much wider audience to make sure that it is widely tested. Standards are important to be able to process data. And no one wants to have to write or constantly download a command-line options library. And I would also institute quality control requirements for all modules (most likely through test coverage requirements) to make sure the code stays top quality.

Once the standard library is in a good place I would be open to evolving the standard library. I very much believe the standard library should be best-of-breed, even if that means removing modules that are good enough for ones that are better. I would have somewhat lengthy deprecation policies such that modules are sure to be around for three releases (roughly five years) before they are finally removed and retired to the Cheeseshop.

Now one interesting wrinkle with train of thought is Python 3. Here the whole argparse discussion is great example. If we accept argparse and then deprecate optparse and getopt, how does that work out in Python 3? People will be slowly transitioning to Python 3 as the years move on, which means the version of Python 3 that gets a large amount of use might not be Python 3.2 but could be 3.3 or even 3.4. That means if we deprecate optparse we have to make sure people don't suddenly switch to Python 3.4 and have their code broken because some module in the standard library is gone. So either we make a guess as to when most people will transition and make sure optparse is still around in that release or we put argparse into 2.7. I suspect most people prefer the latter, but that is more work for Steven so I think it should be his choice of what he is willing to do.

[This blog post was brought to you by Nine Inch Nails' The Slip]

10 comments:

kayschluehr said...

When Python deprecates things like Java ( Java uses an annotation for this purpose in later releases, so why can't Python use a decorator which makes noise at load-time? ), one doesn't have to care much about stuff in deprecated libraries getting broken - in particular in Python where compiling modules doesn't cause many problems popping up. This combined with a removal policy ( e.g. after two minor releases which corresponds to 3 yrs increments ) makes rejuvenation manageable I think.

Notice that I'm totally against the idea that the Cheeseshop is enough. +1000 for a Python standard library.

Jesse said...

Hey you stole my flame fodder :)

asmodai said...

One major issue I see is that PyPI is sufficiently broken in a lot of ways and really needs a redesign in order to make it work much better.

Personally I think the deprecation of parts over time of the standard library ought to occur.

Craig Thrall said...

I would much rather have a great standard library than have to pull in external libs. Using external libs brings all kinds of versioning complexities. Plus, I find myself much more productive when I can just import a standard library as opposed to grovelling around the web (even if there's a repo) for the best XML parser this year. Plus, once I learn parts of the built-in lib, I can use them in every project, instead of having to figure out how this new XML parser works.

This is just my experience as somebody who has used Java in the past, .NET and Python.

nnis said...

+1 to the best of breed Standard Lib. Add the new stuff to Python 3.2 and forget about 2.7 that should be just for fixes/changes to existing libs not new ones. Then in 3.5 you can start removing the old stuff.

André said...

To me, the two strong points in favour of Python are: 1) code readability 2) batteries included. So I would be very disappointed if the standard library were to shrink significantly ... in fact, I would argue for an increase over time!

j_king said...
This post has been removed by the author.
j_king said...

One of the most frustrating developments in the Python community of late is the blind techno-lust for the latest-and-greatest-backwards-compatibility-be-damned libraries.

I just found out a stdlib module I wrote an application with a few months ago is going to be deprecated. How annoying. The whole reason I chose it was because it was in the stdlib! I try to depend on stdlib modules as much as possible when I write code. But now it's almost pointless.

Why?

Because anyone who installs Python will have those libraries and they will work (at least in theory). Developers don't mind installing libraries. End users do. It sucks building separate installers for Python programs.

At this point the Cheeseshop is not good enough to take over AFAICT. It needs to look to systems like CPAN before it can consider itself ready for such a task. It'd need a standard test protocol so we can integrate automated smoke testing for all libraries. It'd need author signatures to verify integrity. There's a lot Cheeseshop still needs to do in order to provide the kind of maintainance currently done by humans on the stdlib.

The stdlib should remain a stable resource of guaranteed libraries. A Python script that depends solely on stdlib modules should be able to expect that those libraries will be available on all platforms that the major version of Python it was written in is available on. If Python 3 wants to be rid of the supposed "cruft" in the Python 2 stdlib, that's fine -- but Python 2 should be committed to maintaining it. Just stop twiddling with the stdlib and let it gain that cruft: its a hard shell that keeps Python programs running year after year.

Vinay Sajip said...

I think that it's too early to be able to make sweeping decisions about stdlib versus CheeseShop. I agree with the comments here which say that CheeseShop, while it's great, has some way to go before it's a seamless source of useful packages. This is true particularly while Tarek Ziadé (and others) are working on dependency management and other improvements to distutils. Till we have a solid distribution system that we can rely on as much as, say, Debian users can on apt-get, I think the stdlib should stay pretty much as is. As to the point about waste of maintenance time, I'm sure there is some waste, but do we have actual numbers (in terms of hours wasted)? Which modules are the culprits which demand the most maintenance time while being maintained by whoever out of a sense of duty rather than anything else? And as for those who want to strip the stdlib down to a minimum - are we talking about core devs only here, or the wider community? The wider community can always ignore any modules sitting around that came with their Python distribution, if there are newer and better versions available on CheeseShop. After all, disk space is cheap and getting cheaper. So in terms of taking things out of the stdlib, before eviction there should be a consensus between core devs that particular modules have outlived their usefulness because of maintenance overhead (i.e. no volunteer to maintain, coupled with a need to maintain because of build breakage, security problems, compatibility with evolving core functionality or similar) - then those modules can be transitioned out via the usual deprecation cycle and moved to CheeseShop, with of course the documentation updated to indicate that the modules have been removed and how to get them from the CheeseShop.

We should remember that "pragmatism beats purity".

Vinay Sajip said...

Sorry, that last bit should have been "practicality beats purity", from the Zen of Python.

In saying that a core stdlib should only consist of stuff that's hard to do cross-platform, like subprocess, multiprocess etc., I think you're overlooking something - TOOWTDI and standardisation. There are libraries which do not involve cross-platform issues and nevertheless are good to have in the stdlib - for example, re. There are some who say that regular expression matching is suboptimal in PCRE, Perl, Python, Ruby, Java etc. and perhaps they're right, but for Python it's better to have a standard re module as part of the core stdlib, rather than having multiple options slug it out in the community. Another issue is standardisation, which I'll illustrate with a scenario. Let's suppose that core stdlib module A is deprecated and moved to the CheeseShop, and that there's also a module B which does the same job, is also available from CheeseShop, and which also has reasonable mindshare. Pretty soon, a library C will show up which has A as a dependency (because C's developers prefer A) and another library D comes along which depends on B (for an analogous reason). C and D are available on CheeseShop, of course. Now a developer of library or application E, which makes use of C and D, has no option but to pull in A and B as well. Multiply this ad nauseam over time. Are you really saying that having this sort of situation is better than the current one? I would say, emphatically not. For me, standardisation/TOOWTDI is more important.

I'm not an advocate of unlimited bloat in the stdlib - far from it. But I think for certain "software infrastructure" types of functionality, these should be part of batteries included until such time (if ever) there is an actual, practical problem with maintenance. For example, some modules may not have an active maintainer and yet need hardly any maintenance.

Argument parsing is another area which would benefit from standardisation, and it's already a bit embarrassing to have getopt and optparse in the stdlib. To add argparse, good and worthy candidate though it is, seems crazy. I have a high regard for argparse and Steven Bethard has done a great job, but surely it would be better to ask him to take over maintenance of optparse and transmogrify it into argparse in a backward-compatible way, and do with the internal implementation as he wishes as long as the public API remains backward-compatible. (He originally tried to subclass and override bits of optparse, to add the new functionality, and because he couldn't do it, created argparse.)

Modulo licensing issues, if Steven is unwilling, it's even possible for someone (I believe Armin Ronacher volunteered on stdlib-sig) to upgrade optparse so that it gains argparse-like functionality. But it seems a mockery of TOOWTDI to have getopt, optparse andargparse all in the stdlib. Puh-lease!

I don't believe we have decent quantitative information about the existing state of the stdlib (in terms of maintenance time and effort required, distance from "best-of-breed" and so on) to be able (yet) to make good decisions about the way forward.

That said, I'm grateful that there's discussion around this issue, and now that I'm subscribed to stdlib-sig, hope to participate a little more.

Post a Comment