2007-01-03

Why doesn't Python have more data format readers in the stdlib?

I just noticed the explosion of XML vs. JSON posts in the blogosphere (JSON, for those that don't know, is a simple data format that is a subset of literal JavaScript syntax and is defined by RFC 4627). Some would say that XML is the saviour of all when it comes to data. Personally I don't agree with that view, and yet Python's stdlib doesn't have much else for data formats.

Uche Ogbuji makes the good point that "XML is much better suited to documents and text than records and data", and that is coming from someone who has worked on the 4suite XML and RDF library. And I think it is a very valid point. XML is great when you need semantic markup to store in a way that is more meant for computers to read than human beings. But in terms of data I can't stand it. Having to define a schema for even the simple things is a pain, even if you use something like the RELAX NG schema language. And Uche is not the only XML supporter to think JSON has its place.

But I know I have personally dismissed JSON before as just a JavaScript/AJAX thing. With my personal hatred of JS (okay, maybe not hatred, but I *really* don't like the language) I ignored it for a long time. But as Simon Willison points out, that is overlooking the point. JSON is simple, small, and gives most people exactly what they need for a language-/platform-neutral data exchange format. I mean just look at the grammar of JSON: the thing is already practically Python syntax as it is, especially if you don't worry about the order of members in an object.

But this post is not about JSON's merits, but more about the lack of modules in the stdlib geared towards data formats. We have a good amount of XML support (with DOM, SAX, and ElementTree interfaces along with the expat parser for fast parsing). But what other data formats do we support? ConfigParser's config file format? XML-RPC? Pickle? There really are not a lot.

Why is that? Well, I know no one has ever stepped forward to provide a module for JSON or YAML parsing to python-dev. I am sure people people get a little intimidated at the idea of proposing to have their module added to the stdlib, especially now that there is the requirement of it having community use and support to begin with. But still, this might be something to strive to get into the stdlib.

Perhaps I will eat my own dog food and write my own JSON parser for fun some time with the goal of getting it into the stdlib some day. I have not written a recursive descent parser in a while and it would probably be good practice for me.