2011-07-16

How to import a module from just a file path

Dr. Brown asked on Twitter whether there was a single expression (e.g., no semi-colons or abuses of and or or) that could import a module from a file path, either through a stdlib function call or just constructed from scratch; mod = import_from_path('some_file.txt'). Because it involved import, I got cc'ed on the tweet while various people tried to come up with a solution. In the end people realized that it's not possible in Python 2 (but it is in Python 3). But how hard could it be, right? Right?!?

Well, the devil is always in the details. While there are some functions in the stdlib that come close (e.g., execfile() and runpy.run_path()), none of them return the module, but the globals dict. Other people tried to come up with something using 'exec', but it doesn't return anything (in Python 2) so you have to assign what you planned to use as the module ahead of time (that's two lines total if you weren't counting along).


In Python 3, though, this is all totally doable thanks to 'exec' becoming a function instead of a keyword:


mod = exec(open('path', 'r').read(), __import__('sys').modules.setdefault('name', __import__('imp').new_module(name)).__dict__)


This is obviously a complete hack and not something you should use in the real world. But it is a single expression (meeting Titus' restriction) and it even puts the module into sys.modules so that future imports will not have to re-import the module. This hack, though, does assume that you are only importing modules and never packages.

So what would it take to do a proper function that took (at least) a file path and ended up importing that file? First thing is whether you wanted any arbitrary path to be used, or only something for properly named Python modules (e.g., ending in .py, packages being named __init__.py). If you go with the former you need to specify the module name and whether it is a package (and if it is what the value of __path__ should be), else you can just gleam all of that info from the path itself.

After that the next step is to create the module object and populate it. The bare bones thing to do is create the object and specify the __name__ and __file__ attributes, along with __path__ if it applies. This has to be done as a module might do something with those values as a side-effect of importation (and remember folks, never launch a thread as a side-effect of an import! Deadlock awaits those who don't heed that warning).

Third, you create the code object. Now you could technically just read the file and pass it to 'exec', but that won't set up the file path, etc. in the code object for easier traceback issues that may arise when you do the actual executing of the module code.

Fourth, you actually execute the module code and finish initializing your module.

Now all of this ignores storing the module in sys.modules or even checking if it even exists in there. It also doesn't add the module to sys.modules so that future imports can use the module instead of having to do the same thing again. But if you are honestly wanting to import a module by specifying a path, you probably don't care about any of this anyway. And because you ignore sys.modules, you get to ignore the import lock.

So, taking all of this together, here is a basically untested function that does what Titus basically wanted (works in Python 3, but changing it for Python 2 is literally just using the 'exec' statement instead of function):


import imp

# Step 1
def import_from_path(path, name='', package_path=None):
    """Import a module from a specified file path.

    If the module is a package, set package_path to a list of directories that
    is to become __path__.

    """
    # Step 2
    mod = imp.new_module(name)
    mod.__file__ = path
    if package_path is not None:
        mod.__path__ = package_path
    # Step 3
    with open(path, 'r') as file:
        code = compile(file.read(), path, 'exec', dont_inherit=True)
    # Step 4
    exec(code, mod.__dict__)
    return mod