2007-07-23

Another (unscientific) comparison of Mercurial and Bazaar

After my rant on how many version control systems (VCSs) there were and the lack of good Eclipse plug-ins for them, I looked at the number of visits to my very unscientific comparison of Mercurial and Bazaar I did back in November 2006. With over 5,300 visits to that page, and after having a very rough month personally and thus needing to do something "relaxing", I decided to do another review.

I took a different approach this time when implementing the tests. The biggest thing I did was run any server needed locally and connecting through 127.0.0.1. Using a local sshd with public/private keys made the tests go much faster compared to last time. I also generated shell scripts from a template to execute the playback steps instead of a Python script that used subprocess as it made debugging as easy as visually checking out the shell script itself.

In terms of what I tested, here are the steps I had every VCS go through:
  1. Initialize a Main repository.
  2. Add an empty file named 'A'.
  3. Commit 'A'.
  4. Add a single line to 'A' and commit.
  5. Create a directory named Local that held a new repository named Checkout that did a remote branch/clone of Main (also used Local as a shared repository for some tests; discussed later).
  6. Create an OtherPerson clone/branch of Main through remote means.
  7. Add a second line to 'A' in OtherPerson and push to Main.
  8. Pull from Main the change to 'A' into Checkout.
  9. Edit the first line of 'A' in Checkout and generate a patch, and then push the change to Main.
  10. Locally branch off of Checkout to create Branch (which is also in the Local directory).
  11. Create a new file named 'B' in OtherPerson and push to Main.
  12. Edit the second line in 'A' in Branch.
  13. Pull 'B' into Checkout.
  14. Merge 'B' into Branch.
  15. Push change in 'A' to Checkout and then to Main.
  16. Pull change to 'A' in to OtherPerson.
Six revisions total (according to Bazaar at least).

In terms of testing I ran using Python 2.5.1 (no pydebug). Both hg and bzr were checked out of their stable development branches (hg as version 5cbdea5735f4 and bzr as revision 2644). For each test I ran it once without counting it to generate any needed .pyc files. I then ran the test three times in a row and recorded the fastest real time as reported by /usr/bin/time.

For measuring the disk space I ran ``du -c -h`` on the Local, Main, and OtherPerson repositories and recorded the reported amounts.

While the tests were running I didn't touch my machine. But I did not try to minimize background processes, etc. I can only do so much in the name of fun. =)

In terms of the various tests I ran they are:
hg
Mercurial over SSH.

bzr
Bazaar using the SFTP protocol for remote pushes and pulls. This is what seems to be the most prominent way to securely connect to a remote bzr branch other than over HTTP as shown in the Bazaar manual.

bzr checkout
Bazaar using a checkout over SFTP. This seems to be for more svn-like development.

bzr ssh
Bazaar using the bzr+ssh protocol.

bzr serve
Bazaar using the bzr protocol and a server running as ``bzr serve --allow-writes``.

bzr shared
Bazaar with a shared repository.

The things that were timed are pretty self-explanatory. For the remote cloning/branching I just call it cloning even though that is not necessarily the command used. And for instances where two commands are needed to complete a specific step (e.g., Mercurial requiring a pull and update to apply changes from a remote branch), the total time is reported.

And everything is measured in seconds.
  • init
    • hg : 0.231
    • bzr : 0.891
    • bzr checkout : 0.891
    • bzr ssh : 0.873
    • bzr serve : 0.834
    • bzr shared : 0.860
  • add
    • hg : 0.294
    • bzr : 0.701
    • bzr checkout : 0.711
    • bzr ssh : 0.696
    • bzr serve : 0.686
    • bzr shared : 0.703
  • commit
    • hg : 0.375
    • bzr : 0.975
    • bzr checkout : 0.973
    • bzr ssh : 0.976
    • bzr serve : 0.958
    • bzr shared : 0.991
  • remote clone
    • hg : 1.202
    • bzr : 1.990
    • bzr checkout : 1.981
    • bzr ssh : 2.791
    • bzr serve : 1.664
    • bzr shared : 2.005
  • remote push
    • hg : 0.859
    • bzr : 1.723
    • bzr checkout : 1.560
    • bzr ssh : 2.193
    • bzr serve : 1.000
    • bzr shared : 1.671
  • remote pull
    • hg : 1.226
    • bzr : 1.713
    • bzr checkout : 1.891
    • bzr ssh : 2.253
    • bzr serve : 1.052
    • bzr shared : 1.772
  • local clone
    • hg : 0.399
    • bzr : 1.217
    • bzr checkout : 1.131
    • bzr ssh : 1.200
    • bzr serve : 1.223
    • bzr shared : 1.131
  • merge
    • hg : 0.691
    • bzr : 0.964
    • bzr checkout : 1.018
    • bzr ssh : 0.980
    • bzr serve : 0.934
    • bzr shared : 0.928

And here are the disk usage stats:
  • Local
    • hg : 96K
    • bzr : 168K
    • bzr checkout : 164K
    • bzr ssh : 168K
    • bzr serve : 168K
    • bzr shared : 136K
  • Main
    • hg : 40K
    • bzr : 72K
    • bzr checkout : 72K
    • bzr ssh : 72K
    • bzr serve : 72K
    • bzr shared : 72K
  • OtherPerson
    • hg : 52K
    • bzr : 84K
    • bzr checkout : 80K
    • bzr ssh : 84K
    • bzr serve : 84K
    • bzr shared : 84K

Once again Mercurial is faster and uses less space when compared to Bazaar. If you do use Bazaar it seems that using a shared repository and a smart server is the best approach (assuming there is no security issues with doing so as I don't know if there are any authentication mechanisms in place for the server when doing a push).

But one thing to consider is robustness of various operations. If you read Mark Shuttleworth's blog enties on VCSs (here, here, here, and here), it would seem to suggest Bazaar might be as slow as it is and take up as much space as it does because it is trying to be lossless. I don't know how Mercurial's approach differs so I have no clue if this is a valid defence for Bazaar's performance.

Probably my biggest knock against either VCS is Bazaar's documentation. The official doc is two versions behind the current release. Even if nothing changed, at least bump the version number up. Plus finding the docs for the various remote protocols was non-obvious. And lastly, the install docs say Paramiko (probably the reason SSH performance is so bad) does not require PyCrypto, but that is completely false; you can't even import Paramiko without PyCrypto.

In terms of friendliness, both projects are great. I have interacted with core developers on both teams and they have always been helpful. The Bazaar group especially has been great as they have taken the time to comment on my blog any time I have had any issues with Bazaar.

And the usual caveat must be made that neither project is at version 1.0 yet.

In the end, if your project is small you can probably live with either project. Bigger ones, though, might want to take a look at Mercurial. But obviously do your own testing by importing your code into both VCSs and see what the performance is like for your common needs.