2009-08-31

Compiling Python using Clang

[edit: added compilation timings]

Like many people (if Twitter is any indication), I upgraded to Snow Leopard and XCode 3.2 this past weekend. One of the nice things that came with the new Developer Tools is Clang 1.0. I have been anticipating the stable release of this tool ever since I watched a video from the LLVM conference on it over a year ago. With it's much improved warning output compared to gcc and it's faster compilation time I wanted to give it a try on CPython.

First off, though, credit needs to be given to the Unladen Swallow guys, and especially Jeffrey Yasskin, for working out some nasty bugs that used to prevent LLVM from compiling CPython over the past year. Without the fixes I would have just given up on using clang.

With CPython now cleanly compiling with clang, I decided to give it a spin. The environment variables I ended up using specific to clang were:
  • CC = clang
  • CFLAGS = -Qunused-arguments
  • CPPFLAGS = -Qunused-arguments
The "-Qunused-arguments" flag tells clang to not complain if it is given command-line arguments that are redundant or unused. If you don't do this you can end up with a ton of warnings about unneeded CPPFLAGS arguments. And it is used in both CFLAGS and CPPFLAGS as otherwise it isn't picked up when setup.py runs (I don't think setup.py or distutils uses CFLAGS at the moment). But otherwise CPython builds fine!

One other thing you might want to try using when building CPython is "-Wno-unused-value". It turns out that PyObject_INIT() and PyObject_INIT_VAR() never have their returned values used explicitly and this flag turns off those warnings as there are a bunch of them and each one refers to two other code locations.

After I originally posted this I got one comment here and a couple on Twitter about what the benchmarking timings were. I caved in and ran them with ``/configure --prefix=/dev/null --with-pydebug --with-computed-gotos --with-universal-archs="64-bit``. In Clang it took a total of 36 seconds while with gcc 37 seconds. So the speed increase is minimal, but the important thing to remember is that the debugging information that Clang spits out is far and away better than what gcc gives you. So while the performance difference is small, the debugging output are not even close to being equal in terms of readability.

2009-08-27

Prioritizing my Python development time

Python has been my hobby since mid 2002. If I were to add up the number of hours I put into various things I do regarding developing for Python I am sure it overshadows every other hobby I have. Having such a large, open-ended hobby like Python development means I can easily fill all of my free time with Python stuff of various types. It also means that, me being me, I tend to over-extend myself with more than I can reasonably take on at the moment. And then on top of everything else, I am lucky enough to be in some unique positions within the Python community, e.g. being a core developer. This leads me having to occasionally stop, take a deep breath, assess what I can and want to do for Python, and then prioritize those goals and projects to make sure the most important things get done in a more expedient fashion than the low priority stuff.

There are several groupings of things I can work on at any one point. One is fixing bugs in Python that I have discovered. As of late that has been incompatibilities in importlib but it can involve doc issues I come across, etc. Nice thing about these projects is that I know what needs fixing so the only hold-up is me generating a fix.

Another type of project is trying to fix bugs as reported by others at the Python issue tracker. This takes a bit more effort than fixing bugs I found as I have to report everything I do in the issue tracker as compared to just fixing the bug. It also requires working with someone else to help clarify what the bug is or debug issues if it happens to be on a platform that I am not on. There is also an issue of motivation; sometimes I honestly just don't find a bug interesting enough to want to work with it. Sometimes I am motivated to work on an issue but the turn-around time of getting help from the original poster (OP) can sometimes kill my desire to fix the bug. But being a core developer means I have intimate knowledge of how Python works, putting me in a unique position to know how to fix certain bugs that most people who have no clue how to start working on.

Sometimes I want to create new code. Importlib is a good example of this as I decided I wanted this to happen so I coded it up. This is a rarity since I can't just add every piece of code that I have thanks to standards for adding new features to Python.

Being a core developer means I can also approve patches. Python is lucky enough to have gained people over the years who are happy to help triage issues so that core developers like me can simply look for issues that are staged as needing a commit review or a patch review or has the needs review keyword. Or if I want to work on patches that are less vetted, I can look at issues that simply have a patch (as you can see we are still experimenting on how to handle the issue workflow). Working on pre-existing patches has the perk of (potentially) of being more productive as I don't have to write the code from scratch or come up with inspiration for a solution. There is also the hope that by approving someone's patch will inspire that person to want to contribute more and even potentially becoming a core developer themselves.

I can also work on easing the development of Python. Because being a part of python-dev has given me so much I am always trying to make it easier for people to get involved or at least give back to Python. Plus easing development helps core developers as well. These are the reasons I helped us make the switch to Roundup, making the decision that we should switch to hg, and that I maintain the dev docs.

I do try to evangelize Python when I can. I go to my local Python user group, I gives talks at PyCon, I answer emails people send python-dev, etc. Evangelism tends to just come up and not be under my direct control. I can't just decide to travel to every user group and give a talk (although I have received some talk invitations which I had to turn down as I can't take the time off to extensive travel until November).

There is also the political aspect of Python. I am the board of the Python Software Foundation which helps protect the Python intellectual property (IP) and dole out the sponsor money we get (which we can always use more of). Typically the stuff I do for the PSF is not fun for me as it's business-related and that is not my strong suit, but it needs to be done. Luckily it is a rare thing for me to deal with as my PSF board work has mostly been related to infrastructure stuff.

Before I go any further, I want to point out the amount of work that is suggested above and how this is all done in my (and others) spare time. There are some people out there who always forget that Python is a volunteer-driven project. The only people I know who get paid to work on anything close to Python itself are the lead implementors of Jython, IronPython, and Unladen Swallow. They are not paid to work on CPython or the language itself. That means bugs in the standard library, the CPython interpreter, docs, etc. are all done by volunteers. So please, if you ever feel the urge to complain about how some bug has been lying around for a while or your patch has not been reviewed yet, realize that there are not many of us (less then 30 active at a time) and we are lucky if we get a couple of hours a week to deal with any Python stuff.

Anyway, with that out of the way, let's discuss how I should best use what little time I have for the benefit of Python. First and foremost, I need to keep myself happy. If I end up resenting my Python work I will do it less or pull out all together. My position as a core developer should also be considered as that is a unique position for me to be in to help others and thus should not be frivolously ignored. And then there is the infrequency argument where something comes up so rarely it should get priority.

With all of that in mind, I think my Python development priorities should be:

  1. Evangelism (rare but fun)
  2. PSF (rare)
  3. Fix bugs I find (typically a fast fix and usually a rare thing)
  4. New code (fun)
  5. development of Python (helps core developers and new people wanting to contribute)
  6. Patch approval (higher fixes/hour than coming up w/ my own code, might get someone more involved)
  7. Fix bugs discovered by others (still squashes bugs in Python)
This list of priorities does differ from how I current do thing (and thus I need to change). One is that I should not write any new code until all known bugs to me have been squashed. Applying that to the present, that means I should not be fiddling with my sqlite3 importer until all importlib incompatibilities and the cProfile/profile merge have occurred. It also means that once I have done that I should help with the coding needed to get python-dev switched over to Mercurial. I should also get off my rear and write a doc on the types of communication used for developers of Python along with how to get a new module or language feature into Python. I should not troll the issue tracker for bugs to fix but instead look at patches already submitted by people to fix bugs. To me all of this seems reasonable for maximizing my Python development time for productivity and enjoyment.

2009-08-18

Testing JavaScript code (and releasing realStorage 1.2)

The motivation behind the blog post is the release of realStorage 1.2. Two main things have been introduced in this version. One is helper functions which handle the (un)serialization of JSON objects into localStorage. This was the last idiom I have personally come across in my thesis work and thus ends (for now) my desire to add convenience functions to realStorage. After this I probably have the greatest desire to get a Gears back-end going for those browsers that do not support localStorage. But that will have to wait until Chromium on OS X begins to support the plug-in.

The other thing I added to this release was support for running the tests using JsTestDriver. This has stemmed from me trying to figure out how to do reasonable testing of both web applications and JavaScript libraries for my thesis work.

For JavaScript libraries I have been using QUnit, the testing framework for jQuery. It's worked out rather well. It is simple and has the proper concept of "sameness" for JS objects (if you have ever tried to do ``{a:42} === {a:42}`` you know what I mean). It also has XHR test support which is handy. Plus the jQuery code uses it so I can delve into their code to find example usage. And finally it has a nice HTML output page to look at failures. Tie that into the developer tools that come with WebKit browsers (e.g. Chrome and Safari) and my debugging situation is pretty good. I personally don't use Firebug as I have found it to be buggy too often and I like the WebKit developer tools. The only thing Firefox has going for it the Work Offline menu option which is handy when you are developing offline web apps.

But having an HTML page for testing a browser is a bit tiresome when you want to test against multiple browsers, which is where JsTestDriver comes in. It's a jar file that lets you "capture" multiple browsers to a server which then runs your tests on all of the captured browsers. Designed for continual integration testing, I find it handy to quickly test all of my tests on multiple browsers while I am developing. Plus there is a QUnit adapter in their svn repository which lets me code in QUnit while running it simultaneously in multiple browsers when I am not actively debugging.

For testing the view of a web app I have found WebDriver works well. There you write Java code (icky I know, but you can use other JVM languages) which uses the Selenium Java bindings and launches browsers to test them. What's neat is that it simulates key input and mouse clicking, so if you use Firefox for the testing browser you can watch the test enter text as it runs (albeit rather quickly).

So that's my JavaScript testing toolchain that I have come up with. So far it has worked out well and helped to keep me sane when dealing with JS.

2009-08-13

PyCon 2010 Call for Proposals

Below is the CFP for PyCon 2010 which just went out. I am currently leaning towards doing a talk on custom importers and importlib myself (unless people want to hear something else from me; leave a comment if you have an opinion).



Call for proposals -- PyCon 2010

Due date: October 1st, 2009

Want to showcase your skills as a Python Hacker? Want to have hundreds of people see your talk on the subject of your choice? Have some hot button issue you think the community needs to address, or have some package, code or project you simply love talking about? Want to launch your master plan to take over the world with python?

PyCon is your platform for getting the word out and teaching something new to hundreds of people, face to face.

Previous PyCon conferences have had a broad range of presentations, from reports on academic and commercial projects, tutorials on a broad range of subjects and case studies. All conference speakers are volunteers and come from a myriad of backgrounds. Some are new speakers, some are old speakers. Everyone is welcome so bring your passion and your code! We're looking to you to help us top the previous years of success PyCon has had.

PyCon 2010 is looking for proposals to fill the formal presentation tracks. The PyCon conference days will be February 19-22, 2010 in Atlanta, Georgia, preceded by the tutorial days (February 17-18), and followed by four days of development sprints (February 22-25).

Online proposal submission is open now! Proposals will be accepted through October 1st, with acceptance notifications coming out on November 15th. For the detailed call for proposals, please see: http://us.pycon.org/2010/conference/proposals.

For videos of talks from previous years - check out: http://pycon.blip.tv.

We look forward to seeing you in Atlanta!

2009-08-10

realStorage 1.1.0 is out

Taking the "release early, release often" mantra almost too seriously, I have now released realStorage 1.1.0 a day after 1.0.0 came out. This new version adds two convenience functions: 'contains' and keysArray.

I added contains() to help prevent people from making the mistake of trying to test for key existence with != instead of !==. Since values are coerced to strings it would mean checking for key existence with realStorage.getItem(key) != null would succeed if the value for a key was null. But if you use !== you will never get an incorrect result as setting a value of null will always lead to "null" being returned.

I added keysArray() because I needed a way to loop through all keys in the store while being able to delete or add keys in a loop. Using realStorage.length and realStorage.key() to loop through keys only works if you don't add/remove keys as key() is only stable as long as all key names are stable. By creating an array of all keys this solves the problem.

This brings realStorage up to parity with my code that I forked it from. Looking at my personal needs, the only thing of great consequence is adding getJSON()/setJSON() so as to simplify my own code. After that I have some interest to get a Gears back-end working, especially if Gears in Chromium on OS X starts working before localStorage does.

2009-08-09

Introducing realStorage 1.0.0

A very nice perk of my PhD work involving web applications is that I am working against HTML5 and thus the cutting edge. That means no Internet Explorer issues and most of the other incompatibilities people deal with in deployed web apps.

But working with cutting edge specifications that are still under development means that I do get to deal with incompatibilities with things that no one else is really aware of. In my case I am using the W3C Web Storage spec heavily for my work. If you are not familiar with the spec it specifies a key/value store, both a persistent store and one that is only valid while the web page is loaded -- and a SQL store. I am only using the key/value store as there is talk of dropping the SQL store so as to not have to deal with specifying SQL for the browser.

It turns out that both Firefox 3.5 and Safari 4 do not follow the spec exactly. Both browsers raise an exception in a case where the spec clearly says null is to be returned. And Firefox 3.5 has an extra incompatibility where it does not coerce key and value arguments to strings, e.g. coerce null to "null".

Because of these incompatibilities, both against the spec and each other, I have started a new open source project called realStorage. Version 1.0.0 provides a compatibility wrapper around localStorage so that (at least) Firefox 3.5 and Safari 4 act the same. Version 1.1 will get some non-standard functions that I have found useful in my PhD work.

One interesting thing (at least to me) that I am trying with this project is how to handle versions within an hg repository. I am trying out the approach that has been discussed for Python: named branches for minor versions with tags for each micro versions. A nice thing about this is that I can have people clone the repository with a specific version in the URL to always stay as up-to-date and cutting edge as they want. So people can clone https://realstorage.googlecode.com/hg/#1.0 and get the latest 1.0.x release. Or they could clone #1.0.0 if they want the exact code I released (unminified) along with the history. Or you can simply drop the specific revision and work from the default branch directly (which should be stable as I don't push to the repo until I am happy with the code). I suspect cloning o a specific tag or branch will become more popular as the use of sub-repositories in Mercurial picks up and people begin to want the most stable minor version of some library.

2009-08-08

'IronPython in Action' review

[Full disclosure: Michael Foord, one of the co-authors of IronPython in Action, had Manning send me a free copy of the book for review purposes and because I asked Michael for a copy since he is a friend]

To the point: if you need to program for Windows and you want to use Python, you should get IronPython in Action. The book does a good job of walking you through examples covering all the major APIs and tools a Windows programmer will end up using for whatever project they are working on.

I actually read this book while I ate breakfast most mornings. Now that's nothing special, but considering I actually continued to read/skim this book even though I have not actively used a Windows box since 2001 should tell you something. This book is clearly written, and does a good job to point out gotchas you might run into through example. But it also does a good job of not overloading you with extraneous info that you could get from other reference sources (every computer book should have something like Appendix C that is nothing more than a list of URLs for reference material). And as an added perk the authors try to be humorous when possible and are even willing to poke fun at Windows.

Now I am not suddenly going to start using Windows because of this book. It turns out Guido ended up with the same feelings towards Windows programming after reading this book as I did. But if I were to be forced to code for Windows, I would be glad I had this book on my shelf.

2009-08-01

PyCon 2010 site is live

The official site for PyCon 2010 is now up and running. For those of you who don't know, PyCon 2010 will be in Atlanta, Georgia in mid/late February.

As I have said before, PyCon 2009 was the best PyCon ever, and I have been to every single one of them. There is no reason why 2010 should not be as good as 2009.