2008-12-24

importlib hits alpha

As of now I am willing to declare importlib (bzr repository) an alpha quality replacement for __import__. Do note that importlib is 3.0 code as that is what I test against. There is also probably some 3.0-specific code as well, but I specifically can't think of any off the top of my head short of some print statements from the test driver.

Currently there are two things I need to fix. One is what exceptions get thrown by importlib. For instance, if some code cannot be decoded as UTF-8 then __import__ raises SyntaxError. Importlib, on the other hand, raises UnicodeDecodeError. Same goes for when a parent package cannot be imported for for submodule; __import__ raises SystemError while importlib raises ImportError. Rather minor stuff that just requires some extra code to raise the proper exception.

The second reason is entirely the fault of the 'compile' built-in. It turns out that 'compile' just doesn't do as much work as __import__ does when processing a string or bytes object. In the case of strings, if you pass in code that has an encoding declared then 'compile' actually tries to decode the string since it works off the buffer interface to the string. Well that doesn't work since the string has already been decoded to UTF-8 by the time I give it to 'compile'. This is reported as issue 4626. Probably the best solution would be to change it so that if a string is passed in the decoding step is skipped.

The issue with bytes is universal newline support. Turns out 'compile' will decode a bytes object properly, encoding declaration and all, but it won't support universal newlines as that feature completely relies on fgets on files at the C level. This is reported as issue 4628. To solve this universal newlines needs to be supported for bytes somehow.

Unfortunately solving either issue is not necessarily simple as this dives into the interface with the parser. That chunk of code tends to only do its fancy tricks when it is working with files. It was painful enough solving the last bug I found in the parser that I do not particularly want to hack on it again. This means I am only going to solve one of the issues. And while issue 4626 is probably the easier of the two to solve, I would rather see if I can get issue 4628 fixed as that cuts down on stat calls and has a better chance of being more useful outside of my use case (although it potentially has a higher memory cost).

It is one of my New Years resolutions for next year to get importlib finished up to the point that it is ready to act as a replacement for __import__. Currently that means ignoring how to expose the code publicly beyond that of __import__. This will not be the case forever, but I have some refactorings I want to do that might tweak the APIs somewhat. And since any API that becomes public needs to be supported for a while I do not want to rush this. All of this is spelled out in the NOTES file for those of you who are curious what the exact plans are and the rough order I plan to address them.