Coder Who Says Py: October 2010

2010-10-31

PyCon 2011's call for proposals closes tomorrow!

If you have not gotten your proposal for a talk, tutorial, or poster for PyCon US 2011, you have until tomorrow! If you have never submitted a talk, do not feel intimidated about presenting at PyCon. People are very friendly there so you don't need to worry about a hostile presenting environment. And if you are not sure what to talk about, what did you discuss with your co-workers last? If they found it interesting then the PyCon crowd might as well.

2010-10-28

Viewing Python 3.2 as the successor to Python 2.7

Over on python-dev a discussion kicked up over what to do about backward-incompatible changes against Python 2.7 in the name of fixing consistency. The suggestion seemed to be for a Python 2.8, but that is simply not going to happen. I think the reason the idea of Python 2.8 even came up is because I don't think people in general realize how python-dev views the latest and upcoming releases of Python, so I just wanted to clarify this point for the general community.

For as long as I have been involved in Python's development (joined python-dev in June 2002), there has always been a maintenance version and an in-development version. When a version of Python is released it immediately becomes the maintenance version. This means no backwards-incompatible changes unless it is deemed a severe bug. The next version of Python after a release that is started is the in-development version which gets new features, fixes that break compatibility, etc. So when Python 2.6.0 was released it became the maintenance version and Python 2.7.0 became the in-development version. It's essentially all about one version succeeding another.

This is exactly the same for Python 2.7/3.2 from the perspective of python-dev. When Python 2.7.0 was released it became a maintenance version and Python 3.2 became/continued to be the in-development version (we also have Python 3.1 under maintenance, but that is inconsequential to this discussion). To python-dev the difference between the "2" and the "3" in "2.7" and "3.2", respectively are superficial form a policy perspective; Python 2.7 is just another maintenance version of Python just as Python 3.2 is just another in-development version. Yes, the backwards-incompatible changes from Python 2.7 to Python 3.2 are bigger than what people are used to for Python, but our development policies are no different than if we were talking about Python 2.6/2.7 instead of 2.7/3.2. Sure Python 2.7 has some consistency issues, bugs, etc. that would benefit from some backwards-incompatible fixes, but so did every other version of Python ever released. And those fixes are going into Python, they just happen to be going into a version of Python with the number "3" as the leading digit in the version number instead of a "2". If you simply can't get around the version number differences, just view Python 2.7 and 3.1 as two branches in version control that converge at Python 3.2.

Which is why the concept of a Python 2.8 coming from python-dev will never happen. Going backwards to insert an in-development version at this point between Python 2.7 and 3.2 is ludicrous and goes against our development process/policy. Yes, for a time we had 2.6/2.7/3.0/3.1 under development at once (and it was a horrendous pain in our collective ass to manage all of that), but that time has passed. We have moved on to only one in-development version and that's Python 3.2. We are not going back to that crazy parallel maintenance (I would wager even if Python 4 ever happens it was such a time sink).

If people want to makes changes to the Python 2.7 code base which are backwards-incompatible in the name of making it more consistent, fixing bugs, or whatever, then fork the code.

Yes I said "fork"; that "dirty" four letter word that starts with "F" and ends in "K". In this case it's not a big deal as what is being discussed is simply bugfixes and not a divergence of features. Python-dev will never sanction a Python 2.8, but if someone decided to clean up some things and release should-have-been-python 2.8 that focues solely on backwards-incompatible bugfixes and consistency issues then that's fine by us. As long as it is obviously not from python-dev then we have no problem with it.

A hypothetical version of Python that is just fixing bugs that python-dev will not to fix for policy reasons is not going to fracture the community like a fork of an in-development branch that takes the project in a new direction. We will continue to add cool new things into Python 3 that people will want so people will want to stick with python-dev's version of Python. Plus the version from python-dev happens to be Guido's version of Python which makes it the most Pythonic version =). Plus plenty of people will simply want to stick with the sanctioned versions of Python simply because they are as official as you can get. You also have all the other Python VMs (Jython, IronPython, PyPy) following what python-dev releases and not some fork. But for those that really love CPython 2.7 but want its kinks ironed out, your option will be to do a fork, and from python-dev that's okay as it won't cause the community to fall apart and have to choose sides.

But obviously I think everyone should just upgrade to Python 3.2 when it comes out instead of worrying about keeping Python 2.7 alive past its prime. =) Regardless, Python 2.8 it never happen.

2010-10-17

My thoughts on Clojure

While a lull in my thesis work allowed me to learn Go, I found myself with a couple more days where my thesis did not require a huge mental outpouring. As such I decided to see if I could learn Clojure before I had to start working on my slides for my defence on Monday.

So why learn Clojure? Having learned Scheme, Haskell, and OCaml (impressions) on top of treating Scala and JavaScript as functional languages, I already had familiarity with functional programming. In fact my functional programming experience stems from learning Scheme at Cal in their intro CS course along with being forced to use Common Lisp in an AI course there (although at that point I already knew C and Python). Having this much exposure to functional programming, I knew that I liked the style along with being comfortable with it. My dislike of Java but acceptance of the JVM (at least until Oracle sued Google) also has always motivated me to find JVM languages that can interface with Java easily.

It took me a total of three days to learn enough about Clojure to implement my core example scripts. In the end I liked the language, but not enough to want to put in the effort to do a semi-complicated example in the language.

First off, Clojure is a nice Lisp language. Scheme comes with too little in the language to use easily for much beyond teaching programming. CL is just like C++: way too much crap in a single language to be truly usable unless you make it your life's goal to master the language (and ignore everything bad about it). So compared to those two leading examples Clojure is great. The included data types are nice and the flexible metadata system is rather cool.

But (unfortunately) Clojure can't be viewed in isolation from Java. When you read the documentation for the language you will notice bits of Java method calls sprinkled throughout. For instance, to convert a string to an integer, the docs use (Integer/parseInt "42"). There is no Clojure-specific way of doing this very basic operation, so you have to realize Clojure is implemented on top of Java to get anything done. I found this slightly annoying as I wanted a language that could work with Java, but not that had to work with it. In other words Clojure is not simply a language implemented on top of the JVM but a language implemented on top of Java.

This connection to Java also turned out to be an issue with having clear error messages. For instance, I misread the docs for the ffirst function, thinking it took the second item from a sequence instead of being caar (if you don't know what caar is, don't worry about it). When I tried to do (ffirst [0 1]) to get at 1, the error message was like this:

Exception in thread "main" java.lang.IllegalArgumentException: Don't know how to create ISeq from: java.lang.Integer (temp.clj:0)
      at clojure.lang.Compiler.eval(Compiler.java:5440)
      at clojure.lang.Compiler.load(Compiler.java:5857)
      at clojure.lang.Compiler.loadFile(Compiler.java:5820)
      at clojure.main$load_script.invoke(main.clj:221)
      at clojure.main$script_opt.invoke(main.clj:273)
      at clojure.main$main.doInvoke(main.clj:354)
      at clojure.lang.RestFn.invoke(RestFn.java:409)
      at clojure.lang.Var.invoke(Var.java:365)
      at clojure.lang.AFn.applyToHelper(AFn.java:163)
      at clojure.lang.Var.applyTo(Var.java:482)
      at clojure.main.main(main.java:37)
Caused by: java.lang.IllegalArgumentException: Don't know how to create ISeq from: java.lang.Integer
      at clojure.lang.RT.seqFrom(RT.java:471)
      at clojure.lang.RT.seq(RT.java:452)
      at clojure.lang.RT.first(RT.java:540)
      at clojure.core$first.invoke(core.clj:53)
      at clojure.core$ffirst.invoke(core.clj:94)
      at user$eval1.invoke(temp.clj:1)
      at clojure.lang.Compiler.eval(Compiler.java:5424)
      ... 10 more

What line did the error happen at? If you don't know you need to ignore the traceback in the compiler that the error caused, you might think line 0 which is obviously wrong. But if you look at the triggering exception (the second one listed) you will notice it is in fact line 1, but only if you read down to the sixth line of the traceback. That gets really annoying really fast. You also need to realize that ISeq represents the sequence interface within Clojure along with knowing what java.lang.Integer is (which is obvious in this example, but not if it happened to be some obscure Java class).

And with all new languages, documentation was an issue. While it was great having documentation that explained the underpinnings of the language from the perspective of the reader and thus get into the nitty-gritty, there was a lack of documentation explaining in more introductory terms how to get stuff done. I had to read through the wiki (which is a WikiBooks book) to learn how to simply do concurrent execution the "right way" (answer: agents along with send). But even after that I had to figure out on my own that to prevent my code from hanging I needed to call (shutdown-agents) as none of the examples I read to figure all of this out made that necessary function call.

Clojure feels like a very good Lisp veneer over Java. But the problem for me is that it feels like a veneer instead of a fully free-standing language. At least with Scala the separation feels much more seamless so that it comes off more like Scala + JDK compared to Clojure + Java (and even then Scala feels like it does a more thorough job of marginalizing the need to know Java even more).

2010-10-12

My thoughts on Go

When my thesis work hit a lull point -- I was waiting to hear back from my committee on the latest edits which were eventually cleared -- I decided I wanted to shift gears off of Oplop and doing so much JavaScript work (which had culminated in my blog post on transitioning from jQuery to Google Closure. Me being me, that meant learning a programming language from my laundry list of languages to learn. I decided to learn Go to continue with my Google theme as of late. As is true with all languages that I "learn", I didn't do any deep dive into Go beyond reading what I could (in this case the tutorial, language spec, and Effective Go) and doing some rather basic example code that I do for every language and one example that takes about 100 lines or more and varies almost every time based on what strikes my fancy. In other words I am in no way a Go expert, but I would like to think I have an inkling of what kind of "flavour" Go is in terms of a language.

First and foremost, Go is an opinionated language. This is made obvious by the fact the document Effective Go "gives tips for writing clear, idiomatic Go code". Now for some people, this is a bad thing. Variety is the spice of life for some programmers, and so they want a language that lets you do things in a various ways based on personal taste (e.g., C++, Perl). Others, though, prefer a strong vision of how a language should work for more coherency (e.g., Python). The former approach has the drawback of making it hard to read other people's code, while the latter approach requires that you actually like the approach the language forces upon you. Go definitely falls into the latter camp of an opinionated language.

Consider formatting. Go takes the approach that any global variable in a package that starts with an uppercase letter is exported, while everything else is not. There is no public/private or static keywords to say what is and is not available outside of the package, just whether there is a capital letter. That clearly puts formatting into the realm of semantics.

Then there is gofmt. Go includes a tool that will reformat your source code for you to follow the Go team's style guide. Now you can control things like spaces over tabs (they prefer tabs for some sick reason) or how many spaces a tab represents, but that's kind of it. It means the entire standard library is formatted very consistently which is nice, but it also means that there is One True Way of formatting.

All of this really smacks you in the face when you try to to an if ... else statement. Take the example of:

if x {
    fmt.Println("true!")
} else {
    fmt.Println("false!")
}

Look at the placement of the braces. Now realize that they can't be on any other lines than the one they are on. So you can't do K&R C style where the opening brace is on its own line. And you can't start the else clause on a line separate from the preceding closing brace. While Go might more-or-less follow C-style formatting, it definitely is not as loose as other languages that use the same style when it comes to formatting rules.

One thing I really appreciate in programming languages is consistency/pervasiveness. If a language provides some feature, I appreciate it when it is generalized in such a way that is it consistently used through the language, becoming pervasive throughout. In Python 3, think of how list comprehensions are now defined in terms of generator expressions. Think of how numbers in Python are just some type that anyone could implement (unlike some crappy languages which claim to be fully object-oriented and yet still have the concept of primitives vs. objects; I'm looking at you, Java). When the language plays by rules that you can also use it makes it easy to comprehend how the overall system works since there are no special cases to remember.

C as a systems language does this, but at a price. Take memory allocation on the heap. Calling malloc just gives you raw memory to do whatever you want with. This makes it a consistent story in terms of how to use heap memory, but it also means you can abuse heap memory. I mean it is a little warped that you can take an array and access the third item through arr[2] or *(arr+2).

Go as a systems language sacrifices consistency for safety. For instance, there are two kinds of memory allocators, new and make. The former is for memory which is type safe even when the memory is zero'ed out (think structs). For make, it's used when the underlying data needs to be formatted in a special way (think maps). In C there would not be this distinction, but then again if you screw up and forget to format your map you are going to have your program blow up in your face (or worse). This and other decisions means that to gain safety Go sacrificed consistency in some places. I understand the decision and I don't think there is a good solution around it, but it does sacrifice mental comprehension somewhat which always sucks for a programming language.

I do have to admit I like goroutines. Now I know someone has already begun to say "but Erlang does a more thorough job of handling message passing concurrency thanks to its design to support recoverability from errors, etc. But you know what? Sometimes I just want an easy to way to do factorial in parallel. Goroutines work rather nicely in those situations where something is embarrassingly parallel. And considering this is a systems language, it's a very nice solution.

Being such a new language, the third-party documentation is rather lacking at the moment. That's unfortunate as trying to find solutions to common issues like how to convert a []byte array to string (answer: bytes.NewBuffer(arr).String()) require figuring it out for yourself. Obviously if Go takes off then this issue will eventually rectify itself.

Being a language that is trying to appeal to the pre-existing C user base, Go tries to be innovative in certain areas, but not in others. This ends up making the language feel like it compromised in the name of appeasing C users in certain cases. For instance, the increment & decrement operators (x++ & x--) are in the language, but only in postfix notation. That's fine as C code typically became much harder to follow when prefix increment/decrement came into play as you had to start thinking about side-effects of expressions. But one of the handy things in C was using increment/decrement as expressions to tighten up your code. But in Go increment/decrement are statements, negating the usefulness of them; x += 1 is not exactly that much harder to type compared to x++.

Or take variable declarations. Let's say you have a variable you want to define and give an initial value. You can do that in the following three ways:

var x int
var x = 42
x := 42

OK, so the first two look like JavaScript-like declarations but with type support. The last version is obviously the shortest but doesn't have direct support for type declarations or assuming the default zero value. Why not try to unify around the short variable declaration and remove the different ways to declare a variable?:

x := 42
x int := 42
x int := _

And honestly, is having to specify a default value that big of a thing? If you simply require the declaration assignment then you don't have to come up with some zero value representation that is type-safe for everyone. How hard would it be for people to be forced to simply say x := 0 or x []byte := nil? Generalizes the syntax more and removes more cognitive overhead of having to remember the various ways to declare a variable along with still preventing the use of a variable that has been unassigned!

Considering Go is a systems language, I would want it to be cognitively simple as possible. You already have to do more work than compared to a language like Python anyway, so why not make sure that any extra cognitive work I must do has some real payoff? Like why even bother with having a switch statement if you are not going to go as far as functional languages do with pattern matching? Yes, Go does have a type switch which is nice, but why even support an expression switch? Now I have to remember both kinds of switches plus the syntax which is slightly non-standard for the declaration in order to support both kinds. It's like the Go designers are taking two steps forward by taking out the evil parts of C, but then they take a step back by supporting syntax that is marginally beneficial.

So what are my overall impressions? Whenever I learn a new language I try to compare it only to languages which I would use in the same situations, e.g., I would not compare Python to Java as they fill different niches. The ones I always have in my head are systems, compiled, and dynamic languages (although the "compiled" division is tenuous at best since Python fills that typical role as well thanks to having very good performance, but I have to recognize that sometimes a statically typed, compiled language has its place and C is just too low-level). Up until now if I was asked to list my favourite systems/compiled/dynamic languages list, it would have been C, Scala/OCaml, Python (still looking for a compiled language obviously). But now I would honestly slot Go in over C. Go provides the low-level access and lack of automated hand-holding needed in a systems language, while providing features that C would have provided if developed today (e.g., garbage collection and strong typing). The typical trip-ups I come across when coding in C are dealt with in Go. While I may have wished they simplified the syntax more and gone with a Python style and dropped the damned curly braces, it's a more-or-less minor complaint compared to what one gains in a systems language in using Go.

2010-10-08

My PyCon US 2011 talk submitted

For PyCon US 2011 I decided to submit a proposal for VM panel like the one I organized for PyCon US 2009. The panel is slated to be made up of me plus:

Frank Wierzbicki : Jython
Dino Viehland : IronPython
Maciej Fijalkowski : PyPy
Jacob Kaplan-Moss : moderator

So some new faces, some old compared to the 2009 panel. Should be even more fun than 2009 now that we know what to expect.

The call for proposals for PyCon is open until November 1, so if you have not already, please consider submitting. Details for talks, tutorials, and posters are all up on the website. Process is painless and presenting is a lot of fun.

As for other talks by me, I have not decided if I want to give my import talk. I feel like the world is probably tired of hearing me talk about import at PyCon, but maybe I'm wrong.

2010-10-04

Lessons learned porting from jQuery to Closure

When I realized the next major feature leap for the Oplop project was going
to require adding drag-and-drop to the Chrome extension
(to become a web app once Google's web app store launches), I realized that my
use of jQuery would not be enough to ease the development of complex UI
interactions. That meant
either integrating jQuery UI into Oplop or switching to another JavaScript
library such as Google Closure or YUI.

In the end I chose to go with switching to Google Closure. I had two reasons
for this.
First, all of my serious JavaScript work has involved using jQuery, so I was
curious to use a different approach to JavaScript (even though I have enjoyed
teh functional approach jQuery promotes). Two, with me starting work for Google
come early/mid 2011 I figured I should learn the JavaScript library my future
employer uses so I can possibly use my 20% time to contribute to
JavaScript-based projects.

It ended up taking me three days, but the work is done and has landed in
Oplop's code repository.
Along the way I learned a few things that might useful to others who decide to
take the plunge and learn how to use Closure.

Jumping around in the docs

First thing I learned is that the Closure Library API has a slight
disconnect between types and functions. If you look under the Type Index tab
of the API docs you get a list of all the types in a certain namespace. That's
fine, but to get a list of functions in a namespace that are not tied to a
type, you need to look under File Index. That's a counter-intuitive naming
scheme. And after having used Python's documentation for so long, I come to
expect to have both types and functions defined in the same namespace to be
listed in the same location. I mean if some types and some functions are
connected enough to be in the same namespace, why not list them in the same
place? This was an annoyance for me.

Use all of the tools

Closure is more than the Closure Library. There is also the Closure
Linter which makes sure that you follow a JavaScript style guide. There is
also the Closure Compiler which not only minifies your code greatly, but
will also perform sanity checks on your code. And I didn't even use Closure
Templates.

All of these tools somewhat feed into each other. For instance, the Closure
Linter makes sure that you have JSDoc type annotations. The
Closure Library lets you specify namespaces and what symbols should be exported
outside of the JavaScript code, e.g., what is called directly in your HTML. All
of this is used by the Closure Compiler to do type checking, proper
minification without bad symbol renaming, etc. By supporting one tool you end
up gaining benefits from the other tools, leading to one potentially caring
enough to use all of them and maximize their benefits.

I used all the tools for Oplop and I am glad I did. I caught a couple bad bits
of JavaScript code early on (where I couldn't do automated tests as testing
Chrome extensions is still a rather manual process). I have proper
documentation for everything. I even have JavaScript code that is
consistently formatted which makes it easier to read. I even got the
side-benefit of getting to minify the code easily which led to shaving 14K off
the zip file for the web app.

Supporting all of these aspects of Closure turned
out to be worth it. Yes it adds a compilation step to my JavaScript work, but
that is acceptable to me as JavaScript's leniency towards potential errors
makes type checking and such a useful thing (if automated testing of Chrome
extensions was easier I may be singing a different tune right now, but that
might be for another time where I try to tie Selenium or something into Chrome
page actions or something).

You are either in or you're out

Closure is an ecosystem. It is not like jQuery where you just use it
here and there to make accessing or manipulating the DOM easier. If you want to
truly use Closure, you have to make a commitment to truly use it. This comes
with some consequences.

Probably the biggest consequence is that if you use other third-party
JavaScript libraries you need to add support for them to work with the various
Closure Tools. You can obviously skip doing this, but then you either have to
start special-casing third-party code (e.g., not running the Closure Linter
over it) or you lose certain benefits (e.g., the Closure Compiler not being
able to detect all possible type errors). So to truly gain all benefits you
might want to retrofit third-party code with at least JSDoc markup (you can do
this externally using extern files). Some of
Google's APIs have extern files already created
along with jQuery 1.3.

Using more raw JavaScript

But probably the biggest shock out of all of this was having to shift to using
raw DOM objects. When one gets used to doing

$('#node').css('background-color', 'red');

it ends up feeling odd doing

var node = goog.dom.getElement('node');
node.style.backgroundColor = 'red';

The niceties that jQuery wraps DOM nodes do spoil you.

But there is obviously a performance penalty. Something to keep in mind with
Closure is that this is the code that Google uses, the company obsessed with
performance. If there is a way to expose a function that does something quickly
that is a pattern not easily exposed through the DOM consistently, then there
will be a function for it. But for something as consistent as changing CSS
values, then there is no such added support.

Initial verdict

So was it worth three days of coding to switch? Having not started the coding
yet for the fancy things I want to add to Oplop that drove me to even
considering switching, it's too early to give a definitive answer. But what I
can say is that I am not sitting here regretting anything. I feel like my
JavaScript is in better shape than it was thanks to Closure forcing me to
follow better coding practices. And the benefits from the compiler thanks to
following those coding practices also make me feel better that my code is in
reasonable shape.