2007-05-23

My first (real) experience with XML (through ElementTree) and Django Templates (through Jinja)

After asking people for advice about how to handle open source work in one's curriculum vitae, I wrote an app to help me manage my CV. I decided to use XML as the data format and then transform it somehow from XML into various other formats (text, HTML, PDF through LaTeX). Nothing fancy, although it was a first for me in terms of serious XML use from my own schema design.

To start, I wrote a rough XML schema using RELAX NG Compact syntax. This was not too bad, although I did have a battle myself over how often to use attributes compared to a full-blown tag. I also cheated in some spots and didn't properly specify when tags didn't have an explicit order, etc. But overall it was not as painful as I expected.

I did write the XML document by hand. I didn't want to go through the hassle of trying to come up with a GUI to help me enter the information. Plus I have done enough HTML by hand that it didn't bother me. But if I do continue to improve this app I will give it a GUI as I don't recommend doing XML by hand when it can be avoided.

I made the decision from the start that I was not in the mood to use SAX for reading the XML document. If I was going to have to load the entire XML doc into memory I figured ElementTree was my best option. Biggest issue I had was making sure that when I read from the 'text' attribute on tags I needed to normalize the whitespace to get rid of extraneous newlines that were introduced from formatting the XML in a human-readable fashion. Otherwise ElementTree was very nice and easy to use. Thanks to /F for developing the package!

For templating I decided to go with Jinja. Georg Brandl's Python documentation redo uses it and so I knew it at least worked. Plus it uses Django templating syntax which is great since I was planning on learning it anyway. The hardest thing I had grasping was how template inheritance worked. I first I wanted my skeleton template to have various blocks that were defined in individual files that all extended the skeleton. That didn't work. Then I wanted my skeleton to extend the individual files like a mixin. That didn't work since multiple inheritance doesn't exist. I finally managed to think how I was supposed to think about template inheritance and ended up with a single skeleton that was extended by a single file that defined all of my code blocks. When I do more than one output format I will need to refactor some common things out into macros and other blocks.

It definitely helped that I was willing to refactor my XML schema while writing the templates. I quickly learned that using attributes as something for filtering on is not the best way to use XML. Ended up using grouping tags for several things that made my life easier. I also think there are a couple of other places that I could simplify the template code without sacrificing semantic information in the XML file that I might try.

In the end it was a pleasant experience. Overall the whole thing went smoothly and I would recommend any of the libraries I used to others.