2006-09-26

The two problems with import I would like to see fixed

So my last post basically got responses of, "simplify, don't complicate" almost across the board. In order to simplify, though, I figured I should start from simpler beginnings and discuss what the current import mechanism doesn't do well that I would like to see improved.

Right now, to keep packages in a place outside of site-packages you either need to add the various directories above your packages and modules to PYTHONPATH or toss in a .pth file into site-packages that handles adding your desired directories to sys.path. This basically works, but using .pth files really hurts Python startup under NFS since it has to get a directory listing for every path added by site.py . And PYTHONPATH is not fun for people on Windows (or at least I think; I am not a Windows user so I could be wrong but last I checked it required editing the Registry).

This is why I had suggested a package registry. That solves the NFS issue since it is a single file to check and allows Windows users to more easily add packages by just using setup.py for the package or some flag on the interpreter to add an entry to the package registry. Another possible solution that is simpler is to replace the package registry with a single .pth file that is kept in a specific location and have everyone just append to that file the paths they want added, relative and absolute.

The other problem I would like solved is handling packages that are distributed in alternative packaging schemes (zip, tarball, etc.). Right now zip files are not a huge issue since you just add the zip file to sys.path somehow. That allows sys.path_hooks' zipimporter pick up the entry and handle it. But what about an importer module that is not part of the base installation? As Fredrik pointed out in a comment in my last blog post, there needs to be a way to add alternative importers so they can be used by installed packages that are added through PYTHONPATH and other ways that are just entries tossed into sys.path .

I believe the current way to do have a package use a non-stdlib importer would be to have a package be a directory, an __init__.py file, and the alternative file store. The __init__.py file for the package would then add the alternative importer to sys.path_hooks and clear sys.path_importer_hooks. That way when any subsequent imports of that package they can then use the alternative importer (might require adding an entry to sys.path for the package as well).

But that is a pain in the rear to have to do that just to use a zip file for packaging. Plus, if only sys.path is used for importer object and one ditches sys.path_importer_hooks and sys.path_hooks you then have the issue of adding even zipimporters since there is no longer a way to have PYTHONPATH/.pth file entries be picked up by alternative importers like they are now.

This was the other reason I had a package registry that associated top-level namespaces with the importer required to import it. This might also be a reason why putting importer objects directly in sys.path without an equivalent sys.import_hooks is a bad idea. And having to keep sys.import_hooks if we can put importer objects on sys.path seems like a waste just so we can handle this bootstrap problem.

How can we have a package that is picked up from PYTHONPATH be able to say, "I can't use the default importer, use this importer instead" without requiring package writers to do a bunch of sys.path trickery in their __init__.py file for their package to get a third-party importer to be used? This one might just require a module in the stdlib to provide helper code to do just this exact thing and that has to be the way it is.

One option is to have a set location for import objects to be kept (e.g., site-importers or PYTHONIMPORTERS, etc.), have those objects continue to implement the find_module() method, and only use that method once during interpreter startup. This would kill NFS if site-imports just contained random importers and PYTHONIMPORTERS did not point directly at the importer modules themselves, which is unfortunate.

All I know is that I would like to be able to bundle up a package into a self-contained tarball or zip file and not require having it be kept in a directory with an __init__.py file with some code that messes with imports and the zip/tarball of the rest of the code. But that just might have to be life (and if it is have something as simple as a function call from a stdlib module to handle the required import tweaks).

I am all for simple. I like the idea of having sys.path contain all import objects (and thus be a merging of the existing sys.path, sys.meta_path, and sys.importer_hooks_cache). Having strings use some default importer object that is set in sys.modules is fine with me to allow PYTHONPATH and .pth files to continue to be a usable system. I just want to make sure the two issues I pointed out here are somehow handled nicely with any changes that are made to the import machinery.