2009-08-31

Compiling Python using Clang

[edit: added compilation timings]

Like many people (if Twitter is any indication), I upgraded to Snow Leopard and XCode 3.2 this past weekend. One of the nice things that came with the new Developer Tools is Clang 1.0. I have been anticipating the stable release of this tool ever since I watched a video from the LLVM conference on it over a year ago. With it's much improved warning output compared to gcc and it's faster compilation time I wanted to give it a try on CPython.

First off, though, credit needs to be given to the Unladen Swallow guys, and especially Jeffrey Yasskin, for working out some nasty bugs that used to prevent LLVM from compiling CPython over the past year. Without the fixes I would have just given up on using clang.

With CPython now cleanly compiling with clang, I decided to give it a spin. The environment variables I ended up using specific to clang were:
  • CC = clang
  • CFLAGS = -Qunused-arguments
  • CPPFLAGS = -Qunused-arguments
The "-Qunused-arguments" flag tells clang to not complain if it is given command-line arguments that are redundant or unused. If you don't do this you can end up with a ton of warnings about unneeded CPPFLAGS arguments. And it is used in both CFLAGS and CPPFLAGS as otherwise it isn't picked up when setup.py runs (I don't think setup.py or distutils uses CFLAGS at the moment). But otherwise CPython builds fine!

One other thing you might want to try using when building CPython is "-Wno-unused-value". It turns out that PyObject_INIT() and PyObject_INIT_VAR() never have their returned values used explicitly and this flag turns off those warnings as there are a bunch of them and each one refers to two other code locations.

After I originally posted this I got one comment here and a couple on Twitter about what the benchmarking timings were. I caved in and ran them with ``/configure --prefix=/dev/null --with-pydebug --with-computed-gotos --with-universal-archs="64-bit``. In Clang it took a total of 36 seconds while with gcc 37 seconds. So the speed increase is minimal, but the important thing to remember is that the debugging information that Clang spits out is far and away better than what gcc gives you. So while the performance difference is small, the debugging output are not even close to being equal in terms of readability.

11 comments:

Larry Hastings said...

Dude! You forgot the benchmarks!

Brett said...

@Larry I don't want to. =) But I can tell you that it feels faster.

Chris Mulligan said...

Brett - very cool. I'd be interested in a more complete writeup, or a link, to how exactly you did it. I've been running into a few issues, but I'm sure it's just my own unfamiliarity with the whole build process for python on OSX. Thanks!

Brett said...

@Chris Nothing special. On Snow Leopard w/ the Developer Tools installed and added to PATH, I used the environment variables ``export CC='clang'; export CFLAGS="-Qunused-arguments -arch x86_64 -mmacosx-version-min=10.6 $CFLAGS"; export CPPFLAGS="-Qunused-arguments $CPPFLAGS"; export LDFLAGS="-Qunused-arguments $LDFLAGS"`` in my .zprofile, and then ran configure/make as normal (after a ``make distclean``). This is against py3k.

Chris Mulligan said...

Thanks. I got it working with your advice and this useful ticket http://bugs.python.org/issue6802.

Ideal said...

Thank you for the article, this is really nice..

Some stats here, time emerge on gentoo:

gcc 4.4.1
real 2m52.851s
user 2m21.667s
sys 0m34.368s

clang-svn
real 2m17.097s
user 1m25.201s
sys 0m32.938s

Ideal said...

Some more benchmarks - pybench gcc(-O3) ~6 sec, pybench clang ~14 sec

Brett said...

I just want to say upfront that I would not trust pybench for any benchmarking numbers. To get good numbers you should run the Unladen Swallow benchmarks.

Chris Mulligan said...

I've actually started running the Unladen Swallow benchmarks, and both my GCCed and my clanged 2.6.2 are losing to Apple's 2.6.1. Gotta duplicate all the switches they're using, as something is missing.

cce3 said...

clang may simply not be generating faster code yet. google around for "clang benchmarks" and i don't see much improvement being reported ... CouchDB showed some bad results for clang, too.

Marco said...

./configure --prefix=/dev/null --with-pydebug --with-computed-gotos --with-universal-archs="64-bit" --enable-universalsdk

returns after some output:

checking size of int... configure: error: cannot compute sizeof (int)

Post a Comment