2010-01-09

Where the Hg transition stands

[edit 2010-01-09: links to mailing list archives containing latest discussion]

At PyCon 2009 it was announced that python-dev planned to move Python development from svn to hg. Well, just because we chose our distributed version control system (DVCS) does not mean that we were ready to hit the switch. For one I took a three month sabbatical from python-dev to get my PhD thesis proposal finished (which I did, thank science). Luckily Dirkjan Ochtman stepped in with PEP 385 and volunteered to handle the transition. At that point we thought that it would be a matter of creating a new sys.mercurial attribute (which we still need to code up), write up new developer docs on the workflow we expect to use, and then do a high fidelity conversion of the revision history to hg and then flip the switch.

But then bloody line endings wielded their ugly heads. While I was writing PEP 374 and evaluating the three leading DVCSs I was under the impression that the win32text extension for hg did what we needed. No one every spoke up saying otherwise while the PEP was out for discussion or anything so I simply didn't worry about it.

But then Mark Hammond came forward and said we had a problem. Obviously Mark has experience working under Windows, but he also has experience with hg thanks to his work with Mozilla. From Mark's experience it seemed that no matter how careful people were that the line endings would get messed up in the repo, and that just isn't acceptable. Martin v. Löwis then came forward and pointed out how this was not acceptable as well. Turned out that win32text didn't properly protect from mistakes at it is user-specific, not repo-specific. This was not what we wanted; svn's svn:eol setting is really handy and has turned out to be great to have around.

So this led to a long discussion over what an hg extension would look like that would mimic what svn:eol did. This led to the idea of the hgeol extension. In a nutshell we would end up with an extension where we had a .hgeol file that was version controlled. It would specify how files should be checked out (e.g. native for the OS, \n, \r\n, or binary) and make sure that no checkins are going to lead to bad line endings. The design can be found in the Mercurial wiki (be aware it is a wiki page so some people have simply dumped ideas in there). The latest discussions on various Mercurial mailing lists can be found here and here (search for [eol] to find the relevant threads).

Martin Geisler, a Mercurial contributor, in the end up picked up the torch and went a good distance. He has his in-development code at bitbucket. But the work is not finished. Martin has to work on his PhD thesis, so he has stopped active development for a few months. That means those that are motivated to help would be greatly appreciated. At this point what is really needed is making sure the code is robust and that is works as desired. That means making sure the tests work and the results are as expected on both UNIX (this includes OS X) and Windows. It also means making sure that the test suite is thorough enough to cover all the possible problems that might come up during development.

This inherently helps test to make sure that the design covers what is needed. One of the reasons this entire line ending problem has not been solved before is most of the Mercurial dev team is either not on Windows or use editors that know how to handle line endings properly (I'm looking at you with the evil eye, Visual Studio). So while we think the current design works, we don't have any real-world usage yet. So some pounding on the extension with a repository that someone actually uses would be great to make sure we didn't miss something.

In other words we would appreciate help pounding the heck out of the extension. Both running the tests, making sure the tests are thorough, and using the extension with an actual repository that gets used on a regular basis would be highly appreciated.

Dirkjan is coming to PyCon 2010 so I would expect at least a lightning talk on this. There is also hope between Dirkjan and I that we can see this transition happen in the first half of this year, but that really depends on the hgeol extension getting into a good enough place that we are not isolating Windows developers.