2007-08-25

A little rant on __path__

What is the __path__ attribute on modules used for? One use is to identify a package from a module. Another is as an import optimization as __path__ replaces sys.path when importing within a package. Lastly, because of its position as a sys.path replacement, it allows for packages to tweak their search path.

Now why does this third use of __path__ slightly irk me? Well, while re-implementing zipimport I discovered it sets _path__ to ``zipfile_path/path_within_zip``. While in and of itself that is harmless, but when you have to deal with that as a pseudo entry on sys.path, it means you need to now figure out that is a package within a path. Obviously that path does not exactly work, so sys.path_hooks is consulted and (hopefully) a new importer can be found for that path.

But that is slightly annoying as you now have multiple importers for one zip file; one for the top-level zip file and then subsequent ones for every package contained within the zip file. This just seems like a waste. But then consider databases where the primary key is just the full module name. Having a unique __path__ value for every package is not really useful as all you really need is for __path__ to point back to the database, not to some unique path location as the DB can easily resolve where to find 'pkg.module' without having ``db_path/pkg`` for __path__.

Basically I am just thinking out loud about the idea of not relying on __path__ so much as representing some kind of directory path and more like a sys.path entry. This means that one should not necessarily expect modification of __path__ to work just as there is no guarantee that an entry on sys.path will find an importer from sys.path_hooks. This would allow for importers to exist that you only need one of but don't belong on sys.meta_path.