2007-10-19

Don't trust a file descriptor grabbed from a file pointer on OS X

I just finished spending three hours chasing down a rather obscure platform issue for issue 1267. For some reason, the file returned by imp.find_module() was returning only a part of the file it found on OS X, while Linux was fine. The patch this dealt with fixed imp.find_module() to properly detect the file encoding for source files so that they were opened properly (an issue I will be dealing with for bootstrapping importlib in Py3K as soon as I finish my 'warnings' work).

So why would a file opened on OS X have its seek position forward a set amount but not on Linux? Granted, the code did read from the file and that was new, but rewind() was called on the file pointer (FILE *). Gdb and ftell() both agreed the file was at position 0 before it left the function. And in no place was the file read from afterwards. And yet the file was consistently at position 4096 if it was over some size when imp.find_module() returned it.

But then a printf() statement in _fileio._FileIO.__init__() showed that the file descriptor (int) it had been passed started at 4096. What the hell was going on? I knew there were no read calls. And I knew that the rewind() call was being reported as successful (same if fseek() had been used as well).

Well, notice how some terminology changed when I discussed where the file we seeked on and where I discovered it had moved forward 4096 bytes; "file pointer" to "file descriptor". Interesting, eh?

It turns out that OS X does not update a file descriptor's position returned using fileno() on a file pointer. Ever. I called rewind() and fseek() on the file pointer and the file descriptor was never re-positioned. I called fflush() and fpurge() after the rewind() call, and still nothing. You would think that at least a call to fileno() would force the state stored by the file pointer to be reflected by the file descriptor.

I have filed a bug report with Apple. I also suggested in the commit that Python standardize on file pointers or descriptors to prevent something like this from happening again. But regardless, if you are on OS X and doing any C stuff, don't mix your file pointers and file descriptors.