Coder Who Says Py: March 2011

2011-03-23

importlib: doing it, and doing it, and doing it well

Earlier today I pushed a change to CPython which allows importlib to pass Python's test suite as the implementation of __import__ (sans failures because nothing expects __loader__ to be universally set on modules). This means that there is no known compatibility issues standing between me and making importlib CPython's implementation of __import__.

So what was the final hurdle? It basically boils down to code objects and their immutability. When you import from bytecode you are essentially loading a code object which represents the module (basically using marshal.loads()). The issue is that code objects embed what file they were created from (the co_filename attribute). But what happens if you relocate a .pyc file? Should co_filename point to the place the file was originally created at or at its current location? Turns out Python thinks it should be the latter.

The issue with updating the co_filename attribute on a code object is that it can only be done by C code; attributes on a code object are immutable from Python. My original plan was to modify marshal.loads() to take a filename argument which represented where the marshal data came from. That way marshal could fix co_filename while keeping the attribute immutable. But this didn't work.

It turns out that the attribute used by __import__ for fixing co_filename is not as thorough as the recursive solution I came up with for marshal.loads(). There is a loop that looks for objects to fix, but as soon as one of them is accurate the search is stopped by __import__. Rather than replicate this somewhat odd solution for marshal, I simply exposed a private API in imp for me to use in order to mutate a code object's co_filename attribute. Not an elegant solution, but since I am taking a PyPy view to import (i.e., it's my job to make importlib as compatible as possible and not tweak __import__ to fit my needs) I didn't have much of a choice.

Regardless, it feels very satisfying to have an implementation of import that passes the test suite now. This means I can work towards bootstrapping importlib as the implementation of import. I am planning to go with some bytecode-freezing solution for CPython where there is a build rule which recreates the bytecode any time importlib._bootstrap is mutated. That will remove the worry of having to rely upon some external file for import to work (otherwise I could come up with some import shim that did enough to import importlib directly). And with performance of importlib already being acceptable I feel fairly confident that this bootstrapping will happen in Python 3.3 (I have started work in the bootstrap_importlib branch of my personal repo).

2011-03-20

PyCon 2011 wrap-up

tl;dr version: it was awesome! Read on to find out what I did at the sprints and two themes that I found at the conference.

Secret site goes live: Python 3 Support on PyPI

When the Python 3 Wall of Shame launched a couple weeks ago, I (along with others) noticed discrepancies (discussed in the comments of the announcement blog post) between the real world and what the list said (e.g., doctutils was listed as not supporting Python 3). A similar issue came from On Python 3 Yet? Both websites seemed to be to accidentally spreading FUD about Python 3 support on PyPI thanks to the projects on PyPI not listing Python 3 support properly. While in no way intentional, it does cause issues when people use these sites as "proof" Python 3 still lacks support from the community.

So how did this happen? Well, projects on PyPI are listed as supporting Python 3 when they set the proper Python 3 trove classifier (instructions on how to fix this are in the Python 3 porting HOWTO). What I suspect has happened is that projects which do support Python 3 but don't have the proper classifier either (a) are not aware they should set it or (b) don't care/too lazy. Either way it leads to a mis-representation of what projects do and do not support Python 3.

To deal with this, I created a website to measure Python 3 support on PyPI. What differentiates this website from the others is that I am personally curating the list of projects that support Python 3. So while PyPI says docutils does not support Python 3, my website does. Same goes for projects which have forks that support Python 3 (e.g., setuptools "supports" Python 3 through distribute). I have even begun to flag projects which have working support in their version control system as "maybe" supporting Python 3. I completely understand why the creators of the other websites don't do this; it takes work. But since I have more invested in Python 3 than most people I am willing to put the work in.

If you go to the home page you will notice that I list a project as either supporting, not supporting, or maybe supporting Python 3. The latter case is there because a ton of projects do not specify at all if they even support Python 2! My hope is that projects begin to properly set their trove classifiers. This not only benefits the community by letting people know if they do or do not support Python 3, but because it can be used to know how far back they support Python. Take Django 1.2.5 as an example. It does not have trove classifiers set to let me know that it supports Python 2.4 through Python 2.7. Heck, I don't even know if they have tested against Python 2.7 yet. But if they set their trove classifiers this would be known both visibly and programmatically.

I am asking the community for help combating this problem of projects not listing their Python 3 support properly. If you know of a project which supports Python 3 somehow but that fact is not known on this new website, please either list it over on the Convore discussion on this topic, leave a comment here, or leave a comment on Google Buzz along with proof of the Python 3 support (e.g., a link to an official page stating the support). If you tack on a project's PyPI name on to http://py3ksupport.appspot.com/pypi/ (e.g., http://py3ksupport.appspot.com/pypi/docutils) you can see if the project's support has been picked up yet or not. As I said, forks do qualify for support as well as do projects which have functioning support in version control but have just not done a release yet. This is all rather important for the top projects as listed on the home page since that is what people will really pay attention to in order to notice Python 3 uptake.

To give people some scope of how much of a difference this curation makes, consider the top 50 projects by downloads for any release. 14 of the top 50 (i.e., 28%) support Python 3 somehow. But if you went by only trove classifiers you would think only 9 project (18%) supported Python 3. Even hard coding modules absorbed into Python's stdlib would leave off 2 of the projects. If you look at latest release downloads you have even bigger support and discrepancies: 16 out of 50 projects support Python 3. But of those, 8 I had to manually flag (4 of which are not from the stdlib). Percentage-wise this manual effort makes a difference and gives a much better indication of how much support there already is for Python 3.

So this is what I have been working on for the past two weeks. I learned a lot about App Engine and creating a scalable website. The whole thing updates every 20 minutes from PyPI in terms of project metdata changes. I also update download totals every day. Long term I hope to add more metrics to gauge what the top projects are (e.g., listed dependencies, Google Code searches, etc.). I also want to develop a Chrome extension which will let a user know when a project supports Python 3 when viewing PyPI even if it doesn't say so (maybe even getting to the point that people can notify me through a button click that the project actually does support Python 3). I also want an API so that people on other websites that use different metrics can have access to the same data I use to mark a project as supported or not. But as of right now, I'm ready to take the weekend off. =)

2011-03-03

PSF core grant, the last days: the website works (so far)!

The last couple of days I have been working hard to hack together (and I mean hack; it is not the most elegant code or solution I have ever written) the remaining bits of the website. I am currently waiting for the task queues to finish doing their thing so that I have a complete data set before I go "public" by asking the community for some help with something. But so far things are working out and hopefully I will wake up in the morning to no errors and the data set complete!

The biggest challenge I have run into over the last couple of days is App Engine's 10K limit on task queue data. that ain't much data. So I had to do some reworking of my data syncing workflow to use the datastore as a temporary storage solution. It worked out, but I'm sure I am doing something silly. At least I made sure everything happens in a transaction so that if anything fails the task queue will simply try again until it works (hopefully it's a transient failure and doesn't require me to fix some code =) .

2011-03-01

Introducing mnfy

Remember back in November when I did a blog post about an AST-based minifier for Python code? Remember how I said I gave up on it out of frustration over how the AST represented things in a way that didn't mirror the syntax? Yeah, that "giving up" part didn't sit well with me, so I created mnfy to help minify/obfuscate Python 3 code.

It's basically a little project that I created which helps in minifying code. I got over my issues with the AST by simply thinking it through and figuring out exactly which AST nodes needed special treatment (e.g., how to detect an 'elif' clause). Once I did that I was able to work on some of the transforms which is where the fun is.

The code currently doesn't do anything fancy that human beings who like to muck about with minifying/obfuscating Python don't already do:

Eliminates unneeded whitespace
Minimal parenthesis usage (mostly)
Uses hexadecimal numbers when it will save on characters
Combines imports into a single line that are defined in a row
Drop unused constants (including docstrings)
Functions into lambda definitions

As I said, nothing nuts, but at least takes some of the tedium out. The transformations of the syntax are also separated into safe and unsafe transforms. I consider safe transforms things that are semantically equivalent to what was changed in 99% of the cases (i.e., if you muck with execution frames you are on your own). Unsafe transforms will work in 90% of the cases, but are not exactly semantically equivalent (e.g., lambdas are not entirely equivalent to a defined function). Thus users can choose their exposure to transforms which may or may not break their code. You can also skip transforms all together and simply go with the elimination of whitespace and unneeded formatting characters (e.g., parentheses). In the latter case I am able to minify the entire standard library and still get back the exact same AST.

Anyway, this was just a fun little project that I don't plan to take that seriously (i.e., don't expect massive churn on the code any time soon =). If either minification/obfuscation or working with Python's AST interests you, there is a talk on using Python's ast module and another on obfuscating Python code at PyCon you can attend.

PSF core grant, day 39: darn scalability

So the server component for the secret website is nearly complete, when I started to test the site through App Engine's dev appserver, some issues arose which were not triggered by unit tests. So I spent Sunday and yesterday (counting as one day thanks to working on bursts on-and-off) working on fixing those issues.

Otherwise the only other news is that the Python-Dev In a Box code is up. Please realize that it will not work for you! It is currently geared towards using the Hg test repo which is not ready for general use (e.g., it's missing some commits that landed in svn). I also am going on an assumption a patch I submitted to coverage.py about a __main__.py file for the repo directory gets accepted, and so I have a file I manually copy over in that instance until that happens. In other words, if you decide to play with the code realize that it probably won't work and so bug reports about it not running are probably premature. =)