2011-02-26

PSF core grant, days 37 & 38: making Python-Dev In a Box a reality

I got enough positive feedback on the Python-Dev In a Box idea from my last blog post that I decided to implement it. Over yesterday and today I have been writing some scripts to help bootstrap people into contributing. I will make the code and repository public as soon as I can get the remote repository at hg.python.org created.

So the goal of this project was to make it easy for people to jump in and start contributing at a sprint or on their way home from a conference. That meant providing the code, tools, and information needed to contribute. It also meant doing as much upfront for people as possible so as to not waste time for what everyone would end up doing along with not requiring a network connection. And all of this, at max, needs to fit in 700 MB on a CD.

First, there is a script called make_a_box.py which creates a Python-Dev In a Box instance. It will prompt you for what you want to include in the box:

When reasonable it will build what is provided. In the case of CPython and the various bits of documentation, the docs get built. For coverage.py it runs it so you have a basic coverage report for people to work off of.

For users, I included two extra scripts. One simply builds CPython (on UNIX). While building is very straightfoward (and documented in the devguide), I needed the functionality for generating the coverage report. Since I needed the code anyway I figured I would make it a stand-alone script. About the only real perk to it is that it uses multiprocessing.cpu_count() to run the Makefile with as many cores as you have (although CPython compiles in a fairly linear fashion so extra cores doesn't do that much).

I also provided a script that runs the test suite in the fastest, most rigorous way possible. Once again, while documented in the devguide, people can get a little lazy when it comes to running the entire test suite, so hopefully having a script that uses the "best" options will lead to more rigorous testing overall.

My plan is to create a Box for the PyCon sprints so that people attending the sprint can simply copy from a CD or flash drive everything they need to get going. This should speed up people getting started, leading to more productivity.

2011-02-24

PSF core grant, day 36: maintaining a website is a pain, Hg, and Python-Dev In a Box

I'm this close to having the data maintenance aspect of the "secret" website done, but came up short when I forgot that entities in App Engine cannot be stored in a transaction unless they are in the same entity group. Since I have tens of thousands of entities, putting them all in a single entity group seems stupid, so I will simply have to break out another worker to be called from a task queue in order to process each entity one by one (at least I can still batch the external network calls). Bah. So close!

While nothing to do directly with me, on the behalf of himself and Georg Brandl, Antoine Pitrou announced a test Mercurial repository for (C)Python. I'm obviously excited about this since I spearheaded this whole debacle movement to switch from svn to hg back in late 2008, making the decision of hg with Guido at PyCon 2009. It looks like there is a slight chance we might actually get this done in the near future (since Antoine and Georg are pushing this the last little distance I can't speak to whether they will get it done by PyCon 2011, but it would be neat if they did).

A rather nice side-effect of switching to Mercurial is the smaller amount of space that a Mercurial repo of Python takes up compared to Subversion. This led to a slight discussion #python-dev about creating a Python-Dev In a Box. The idea is that to help spur contributors one could put the Mercurial repo, the devguide (pre-built, but also along with the hg repo), Mercurial itself (including TortoiseHg), pre-build the docs (so that people have a copy of Sphinx and such which can also be used to build , and anything else I could think of (PEPs checkout? Visual Studio Express installer? coverage.py along with coverage results?) on a CD or flash drive so that people have everything they need to start hacking on Python. And by putting it on physical media it makes it easy to bootstrap everyone in the room quickly without slamming the network. It also provides everyone even the ability to get going while offline (e.g., "Can't stay for the sprint? Take a CD and start on the plane home (if you are not on Windows as that requires downloading Visual Studio Express)!"). Hell, think if this became a bag stuffer at PyCon! I wonder if that would increase contributions?

2011-02-23

PSF core grant, days 33, 34, & 35: Python 3.2.0, fighting personal feature creep

First off, for those of you who don't know, Python 3.2.0 was released on February 20th (which serendipitously happens to correspond to the 20 year anniversary of Python 0.9, the first release of Python). To celebrate Python being a score old, why don't you help port a project to Python 3.2.0? =)

With Python 3.2.0 out the door, the development branch of Python was open to commits again. I was able to commit all but one of my patch queue I have been sitting on for over a month. This means that the in-development branch of Python can now run the test suite under coverage.py without any special patches. I also fixed a bunch of static analysis warnings found by Clang. I am ignoring the time burned on trying to fix test_zlib failing under OS X as people much wiser than I at mmap solved that one. I also fixed the new crypt module changes to be PEP 8 compliant and simplified the API a little.

As for the secret project, I am working on not letting feature creep get the better of me. I scaled it back slightly so that it won't start out life with as much stuff since it didn't need it everything. But then I also thought of a cool way to get more people to participate with the website through a Chrome extension. Out with one feature, in with another.

But I am reaching the ability to launch in private beta in about a week. I don't see why I won't have this ready to go for public consumption (in a rather rough state) by PyCon.

2011-02-20

PSF core grant, day 30, 31, and 32: lots of testing, moving on to data maintenance

In a previous blog post I mentioned my refactoring/testing strategy for the website I am developing. Friday was all about implementing that strategy. Even without touching the network, my tests were good enough to lead to only two very shallow bugs that needed fixing. At that point I am basically caught up in my testing regimen.

That means new features! But yesterday and today have been a back-and-forth battle between trying to do a good job and worrying too much about rare edge cases and features that are not needed. Since the data only needs to portray details in a rough level and only for a subset of the available data, I realized I need to relax and not sweat every single little detail that could come up. It's just as easy to run an occasional sanity check to rectify any data drift that may come up.

And a tip to those using nose-GAE, if you get an odd error about not being able to delete a datastore file, it probably means you have a syntax error somewhere.

And a tip for those using Google Chrome: be aware that if you are manually triggering things through a GET request, Chrome's prefetching of pages can cause you to accidentally request twice. This was a problem for me as I was doubling the stuff I was putting on task queues and it took me a while to realize why.

2011-02-17

Register for PyCon *NOW* if you actually plan to attend

A trusted chair of PyCon US 2011, Van Lindberg, did a post explaining some of the behind-the-scenes issues that must be worked out when organizing the conference. It's an interesting post that's worth reading.

But one of the key take-aways from that post is that the hotel space is going quickly. We have already taken 105% of our allotted space. It's already projected that we will sell out PyCon this year. Make sure you register if you have not already else you may very well find yourself unable to attend no matter how badly you want to come.

PSF core grant, days 28 & 29: learning the hard way that you need tests

My little skunkworks website got cleared to be worked on, so I have shifted my focus to it. I think I am going to keep it a secret until PyCon and announce it during a lightning talk for fun. But just because the subject of my work is a secret doesn't mean the process of creating the website needs to be kept from the public (even if it means I have to admit to what I am doing early; being secretive is not exactly critical).

I must admit that most of my experience in creating websites is relegated to either simple stuff or more JavaScript-heavy work. This is turning out to be the first website where there is a serious back-end that has offline workers, constantly updating data, etc. In other words it is a learning experience for me.

The first lesson is that I can't be sloppy with this website. Back on Sunday I started to slap together the website on Google App Engine on my machine, simply coding away until I thought I reached a point where I could actually examine some data. I was using task queues for the first time and trying to coordinate all of this communication with various queues running at various rates with different workers, etc. In other words I over-engineered.

But even worse is that I was not doing any proper testing. I figured I might just do some hand-driven tests on a subset of data to make sure things were working out. But my slap-dash coding quickly became a hindrance as I ran into several import errors thanks to me not paying attention and making sure to add the proper import statement. With the turn-around of launching a page which populated a task queue being long, I knew that I just couldn't keep going like this.

And so I knew I had to recode for testing. But how the heck do you test a website, let alone an App Engine website? I mean I know how to write proper unit tests, functional tests, etc., but I didn't know the best way to handle that for a website that has to run under some heavy infrastructure that has to be available.

This called for some research to even get tests executing. I stumbled across nose-gae which seemed to give me what I needed to get the tests executed if I was willing to use nose (which I was). I even tried to be a good little developer and work under virtualenv, but I ran up against odd path issues involving App Engine that I simply did not want to try to diagnose after an attempt to use gaeunit left me wanting (not enough traceback info to debug well). Luckily nose and nose-gae were the only things I felt the need to actually install so I didn't totally screw up my Python 2.5 install.

With tests runnable, I then had to decide how the heck I was going to do this. And that's when I decided I needed to refactor the hell out of my code. Being an App Engine app that is (at least currently) using no web framework, every URL resolves to a request handler class. That means there is at least a get() or post() method on each class. So I made the decision that all request handling (e.g., getting what is in some POST argument but not even decoding the JSON) and all response handling (e.g., writing back out any HTML) would be handled in the get/post method, but nothing more; essentially they are gutted to the point of simply stitching together method calls, as one typically should when there is I/O involved. I then broke code down into helper methods as necessary in order to think about the functionality in the way I needed to. I also put all transaction code in its own method since I am purposefully keeping that work to a computational minimum. This allows me to mock out things like task queue usage in order to verify what I want to happen is going on without worrying about App Engine actually using some task queue and having to clean that up manually later (handling actual datastore work is no big deal so I am happy to let that actually occur with proper test cleanup).

All of this leads to me writing abstracted, modular code where I can test non-network, non-App Engine code quickly and easily and then purposefully cleaning up or mock out as needed when I do need to work with App Engine's assets. At this point I have not written tests for the actual website page requests as the get()/post() methods as they are dirt-simple and they are just being called by App Engine task queue events (but eventually I will once I decide how I want to handle that specific case; webtest?).

What is the lesson in all of this? Don't be sloppy. Assume you will have to test your code eventually, forcing you to write code in a fashion that makes the code easy to test. And then actually bother to test your code. Dealing with the network or APIs that have no way to undo or cancel an event programmatically (e.g., something in a task queue) should not stop you from writing tests as there are ways to isolate these things such that they are not a burden.

2011-02-14

PSF core grant, day 27: little details, a secret, and a progress report

Started out the day creating a patch to link from the issue tracker to the devguide. There have been a couple people who have asked me to have the issue tracker fields link to the devguide to give a better explanation. Plus updating the docs is easier than the tracker.

But what really took up most of my time today (and yesterday, but I had errands throughout today so I am counting this as a day's work) is a little skunkworks project I started working on related to Python 3 and combating some (accidental) FUD that seems to keep coming up that really needs to get squashed.

Anyway, I also figured I would summarize what the heck I have done as part of my grant as I have finished day 27 of 41 (and I need to summarize for the PSF board this week). I completed the devguide, which I have wanted to do for literally years. There is also a branch of it already prepared for when the switch to Hg occurs. This also led to various patches that make sure the stdlib can have its test coverage measured, fixed a bunch of static analysis warnings from Clang, and have import failures not implicitly represent a skipped test.

I wrote a HOWTO on how to port a Python 2 project to Python 3. Now the community finally has a single document to read to at least get started in porting instead of having to pull up various blog posts on the subject.

All of this and I am three days behind "schedule" of ending my grant come March 1 (unless my math is wrong and this really isn't the 30th weekday since January 4th, inclusive). Overall I'm happy with what I have done and I am hoping the remaining 14 days are just as productive.

2011-02-11

PSF core grant, days 25 & 26: unexpected skips & python.org/dev/ no more

The theme for the past two days has been about removing things. Yesterday was about issue 10966 and finishing the patch to remove the concept of (un)expected skipped tests. Today was about gutting http://www.python.org/dev/ to properly represent the fact that almost all of its content has been subsumed by http://docs.python.org/devguide/. Once the website rebuilds proper redirects will be in place to redirect from the old content to the devguide.

2011-02-09

PSF core grant, days 23 & 24: prepping for hg, cleaning up test skipping

Sorry for being behind on updates. Monday was final paperwork on the PhD (unless the university tells me otherwise I am now done with everything), yesterday it just got too late to worry about posting, and then this morning I just got in a groove and didn't want to stop to post. So today's post covers work done both today and yesterday.

For preparing for the transition from svn to hg, me and several other core developers have a preliminary draft of the devguide ported over. It currently suggests using mq, but that may change to feature clones if it turns out that the mq instructions are too hard to follow.

For cleaning up test skipping, I have begun work on issue 10966. I am basically trying to make it an error for extension modules to fail when they are not compiled on platforms they are expected for. This has been an issue in the past as compilation errors have gone unnoticed as tests were flagged as skipped instead of as a failure simply because the import failed. This will also do away with the idea of expected and unexpected test skips as the test code itself will state on what platforms a test is required/optional on.

2011-02-05

PSF core grant, day 22: Python 2/3 porting HOWTO goes live, starting to doc for Hg transition

As promised the other day, the Python 2/3 porting HOWTO is now live. Yesterday I fixed some things based on feedback from people that I committed today (as well as others contributing their own fixes).

Working on the HOWTO in svn has reminded me how much I prefer hg thanks to all of my work on the devguide. Related to that, I started a branch of the devguide which outlines using hg instead of svn once the transition occurs. I have done all of the easy updates so far, leaving the tough one of outlining a basic workflow which includes back/forward-porting changes.

This whole bit of work stemmed from a massive python-committers thread on the topic. Basically people are trying to decide how best to structure the workflow of python-dev. In svn the way things work is that everything is committed in py3k unless the branch is frozen for cutting a release. If we are in an RC then you have to get approval to commit changes, otherwise you sit on your change until py3k opens up again. As for porting changes between versions, you commit in py3k and then use svnmerge to backport to e.g., release-31maint.

For hg the thinking is to tweak this. It's quite possible that when an RC is reached, a branch will be created that no one but the release manager can touch. When there is a fix in py3k that should be in the RC, the RM (release manager) can cherry-pick the commit and pull it in. While this might seem to create more work for the RM by having them need to execute some cherry-picking command, it does mean they don't need to watch commits that need to be reverted and should also make committers think twice about what they want to bug the RM about.

As for porting, we will most likely switch to a forward-porting strategy. So people would apply a fix in release-31maint and then pull into py3k. The amount of work should be no worse or better than it is with svnmerge, but the DAG from hg should allow for making more sense of what exactly is going on in the history. Plus it makes doing a blanket pull much easier. Also, people will have to make the decision upfront as to whether something needs backporting. Granted, some people might still refuse to put the effort in to backport, but at least for those of us who do it should be at least equal, if not better than, using svnmerge.

But I need to write all of this up and get the resident hg experts to agree on all of this first.

2011-02-03

PSF core grant, day 21: new Python 2/3 porting HOWTO, fixing static analysis warnings

Big news today (at least for me) is that Georg let me check in the Python 2/3 porting HOWTO into py3k so it will be included with Python 3.2. Unfortunately I have no link to give out at the moment as the docs have not done their daily rebuild yet.

The other thing that I did was re-run Clang's static analyzer over CPython with certain function annotated as 'noreturn' and use their newest release. In the end this led to a bunch of dead code being removed and one bug being found in sqlite3.

2011-02-02

PSF core grant, day 20: writin' them words, killin' them assignments

Since my dissertation was formally accepted by UBC yesterday, I spent part of yesterday just trying to grasp the fact that I finished my PhD, and part of today dealing with paperwork from my department to prove that I don't owe them anything.

But between the two days I managed to get some stuff done. I finished the initial draft of my Python 2/3 porting guide. I am waiting to hear whether I can sneak it into Python 3.2 or not.

I also managed to fix all of the dead assignments detected by clang's static analyzer.