2008-08-18

Found out why my bootstrapping of importlib has been failing (probably)

In my last update on importlib nearly two months ago, I mentioned I had three failing tests; two were seemingly BOM related while the latter was just a lack of a feature. The feature got implemented, leaving just the BOM issues.

Having a failure seem to be related to the BOM made me think that some how I was not handling the encoding of source files properly when they defined a specific encoding. I thought that somehow the parser handled things differently enough between reading from files and using a string that I was being prevented from reading from a file directly and feeding that info to compile().

Well, today I decided to dig into the issue after having done some other compile() work yesterday. Turns out that the problem was not in any of my code, but the parser's API for finding out the encoding of a source file. If you look at Parser/tokenizer.c:PyTokenizer_FindEncoding() you will discover that at some point open() is called, but with a path of NULL. Oops.

And because of the one place where PyTokenizer_FindEncoding() didn't use PyErr_Occurred() to see if an error occurred, the exception went unnoticed. I bet at some point PyErr_Clear() is blindly called and that hid the exception.

But in my bootstrapping of importlib, the call path is not the same, and at some point PyErr_Occurred() is checked and the exception exposes itself in an odd place. Anyway, a few hours with gdb and I figured all of this out.

So it's quite possible that my bootstrapping of importlib is actually ready to go as soon as this pre-existing bug is fixed!