2006-02-19

clarification between the AST compiler and compiler package

There seems to be some confusion between what the AST compiler is compared to the compiler package. Hopefully this post can clarify the differences and how they relate to each other.

The compilier package was introduced in order to provide an AST for people to use to analyze Python code and to have more direct control over bytecode generation. The way the Python compilation process originally worked was parser -> CST (concrete syntax tree) -> bytecode. The compiler package replaced the last two steps with Python code in the stdlib.

The internal CST -> bytecode steps were not exposed to the Python level because there was no good way to do it. Since the internal compiler worked off of the CST it was messy and no real AST to expose for people to work off of. It worked just off of the CST and that is just painful.

But now there is the AST compiler. This was implemented from scratch to introduce an AST to the internal compiler. So now Python's compiler mirrors the basic steps taken by the compiler package which is a more traditional take on a compiler structure. This was all done to make the internal compiler easier to maintain and use.

So how does the internal AST compiler and the compiler package differ? Well, the structure of the nodes are different. You can see the internal AST structure in svn compared to the one used in the compiler package. They were also implemented separately. And of course the language used to implement them is different. So while they provide similar functionality, they differ in the specific execution.


Where does this put each of them? The internal AST compiler is in slight flux since there is still the question of whether it will stay as-is compared to the ast-objects branch. As for the compiler package, it replicates functionality now immplemented by Python interally. If the internal AST could be exposed through the stdlib then perhaps the compiler package could use that instead of its own custom AST.

And that is a possibility. Keeping two different ASTs is redundant. And the maintenance of the internal AST will probably be better since it is more critical than the compiler package. This means that I expect the compiler package's custom AST and bytecode compiler to be ripped out and replaced with the internal AST and bytecode compiler. This would be done by allowing the internal AST to be exposed to Python code and to provide a function that takes an AST and returns the code object for that AST.

When this happens I don't know. It is very possible it won't happen until Python 3. Regardless, it still requires getting the internal AST exposed and an API established. That might not happen until Python 2.6, but that will probably be the next big thing for the AST compiler.

So hopefully this clarifies stuff for people. Obviously none of this is in stone and most of it is based on my opinion, which makes it even more possible to not happen. =)