2006-02-09

Why Python doesn't need something like the LINQ Project

As part of my search for a dissertation topic, I ended up learning about the LINQ Project which adds stream processing (they call it "querying") for disparate store types (DB, XML, etc). The funny thing is that as I read the description of what it added, I just kept saying to myself, "Python already does that" (actually, if you look a lot of what C# 3 adds to the language is to make dealing with the static typing easier and thus is in or not needed by Python, but that is another story).

So, how much of LINQ is already doable in Python? Well, LINQ works on the IEnumerable interface, which would be the equivalant of an iterable in Python (only difference is there is no Reset() method in Python's iterator protocol, but you can always just get a new iterator for an object if you want to start over, so it's not really needed). So LINQ works on anything that implements the interface as does Python. This means you can iterate over a DB, XML, whatever, as long as it implements the required interface.

To implement the methods used on IEnumberable-implementing classes and used by the new syntax added by LINQ (covered later), C# 3 introduced extension methods. You can think of these as methods you monkeypatch on to classes in Python. By importing an extension method into a namespace it is automatically injected into the class in the current namespace. Python does not quite have that, although if you imported a metaclass as __metaclass__ at the global level of a module it could be made to do the same thing. So a point to LINQ and the simpler automation of injecion thanks to typing.

But then again, this kind of injection is not needed. If you look at the syntax added by LINQ, an example form being

from itemName in srcExpr where predExpr orderby keyExpr select selExpr
there is nothing there cannot be done with generator expressions and sorted() and thus not require adding support to a class to work with some special syntax. Take the example

from s in names
where s.Length == 5
orderby s
select s.ToUpper()
What is this trying to do? Well, it is taking each item that is exactly 5 characters long, sorting it based on itself, and putting it to uppercase. Nothing complex.

But how would you do that in Python?

sorted(s.upper() for s in names if len(s) == 5)
All we are doing is passing a generator expression that is filtering on the length of 5 on the items from the iterator for 'names', calling upper() on each, and then sorting that result. Now that is not lazy because of the sorted call, but just stick it in a lambda call if you need that sort of laziness (leaving off the sorted call would make it entirely lazy thanks to the genexp). So, in general, no power is lost between the Python and C# version.

But maybe I am just lucky with the first example from LINQ. Let's look at the full syntax:

from itemName in srcExpr
((from itemName in srcExpr) | (where predExpr))*
(orderby (keyExpr (ascending|descending)?)+)?
((select selExpr | (group selExpr by keyExpr))
Let's take a quick look at the parts that are available. You can iterate over return iterables and filter on each returned item from the iterable (from and where). You can order them based on a lambda expression (orderby), and then modify based on another lambda expression (select) and group things together based on a filter lambda expression and on a key extraction lambda function (group).

There is nothing there that you can't do in Python with a generator expression and a call to sorted(). You can implement from, where, and select in a generator expression. sorted() takes care of orderby. And for group, you can use itertools.groupby(). So, using the same variable names in the example, here is roughly how to reformat in Python:

selExpr for itemName in srcExpr((if predExpr)? (for itemName in srcExpr)?)*
for the genexp, and then pass that to

sorted(genexp, key=keyExpr, reverse=(True if descending else False))
And you can pass the genexp through itertools.groupby() before passing on to sorted() (see this recipe, by yours truly and in the dead-tree version of the Python Cookbook, on how to use itertools.groupby()).

Basically the syntax support for LINQ is doable syntactically in Python already using a much more general system. The biggest difference is the use of sorted() with a genexp and itertools.groupby() instead of being a fully enclosed solution in syntax. That and the nicer way of doing monkeypatching on classes with extension methods are maybe the only perks to how LINQ does it. Otherwise their is nothing there Python can't already do. And I am not sure if you can use the query syntax anywhere like you can use a genexp anywhere in Python.

Go, Python, go! At least C# is becoming more tolerable. =) Too bad I can't say the same of Java.