Showing posts sorted by relevance for query bootstrap importlib. Sort by date Show all posts
Showing posts sorted by relevance for query bootstrap importlib. Sort by date Show all posts

2012-02-06

How I bootstrapped importlib

If you have been reading this blog over the past five years I am sure you have read a post or five about my desire to bootstrap importlib into Python as the implementation of __import__. Well, as of today I'm willing to say that the difficult technological hurdles have been scaled! At this point the only thing holding me back from taking my code from https://hg.python.org/sandbox/bcannon#bootstrap_importlib and making importlib drive import statements are some small compatibility issues, integrating into the build process better, a code review, and python-dev sign-off. In other words all of the interesting problems have been solved, so I'm finally ready to write a blog post discussing how I pulled off what I have.

So how exactly do you import __import__? To begin, as with any bootstrap challenge, you need to figure out what is available to you so you know what your design parameters are. In my case I knew I couldn't import anything that required filesystem access since half of import is handling the search for a module (the other half is the actual importing); if I wanted to import a file I would need to essentially write half of import in C to work properly. This restriction also has unexpected side-effects, e.g. you can't rely on open() because that is part of the io module which is a Python module.

That meant I could only rely on built-in modules. If you run sys.builtin_module_names you will discover what is available directly within the CPython binary. The question then becomes if that is enough? It turns out that yes, those built-in modules are enough to perform an import. OK, so you know you have the bare minimum modules required to do an import, but how the heck do you get the built-in modules into the global scope of the module that imports module since you can't use an import statements?

This is when Python's dynamism comes in handy. Since the import statement doesn't do much more than pull in the module object and assign it to a variable at the global scope of the module, I just needed to get the module object for importlib and assign to its __dict__ the built-in modules I needed. Turns out that sys and imp are enough to allow importlib to handle the import of the rest of the built-in modules needed for import to work, so that kept this bit of code short.

But this brings up the next quandry: how do I create a module object of importlib? If I end up searching for importlib on sys.modules then I would have ended up implementing a decent chunk of import itself. So how could I get the module object? This is when frozen modules comes into play.

A frozen module is just a C array containing the marshaled code for a module (which is what a .pyc file is sans magic number, timestamp, and now file size of the source). Since marshal is a built-in module then frozen modules can be loaded without issue. That means you can load a frozen module without using import (much like importing built-in modules).

And that is all of the parts needed to import importlib w/o import. =) To summarize, you get importlib set as __import__ by doing the following:

  1. Import the frozen module (i.e. read in a C array of a marshaled module object and unmarshal it)
  2. Import sys and imp (built-in modules, so done in C code by calling key C functions which return module objects) and set it on the module object
  3. Call Python code to import the rest of the built-in modules using sys and imp
  4. Set Python-based __import__ on the builtins module
And voila! __import__ ends up implemented in pure Python code. Now I just need to clean up the code, fix the compatibility issues, rip out the old C code, and get python-dev to sign off. =) Hopefully I will get far enough I will have a lightning talk at PyCon with benchmark numbers to show this is actually all a good thing (including ripping out a ton of C code, especially if I can re-implement chunks of imp in pure Python =).

2009-09-02

Intersection of built-in modules between CPython, Jython and IronPython

[EDIT: updated for IronPython 2.6b2; made it clearer which VMs are missing what modules that importlib relies upon]

It has been a big goal of mine to make importlib the default implementation of import for CPython. But an even bigger goal has been to make it the default implementation for ALL full featured implementations of Python once they implement Python 3. Not only would it make sure that all VMs have consistent semantics when it came to imports, but to also prevent every VM from having to re-implement import themselves.

But using importlib as import imposes a bootstrapping problem. How do you import, well, import? First off, you need to find the source code, compile it into a code object, and create a module object using that code object. That part is actually easy as you can simply look for the file on sys.path since you know what you are looking for, you can compile the source using the built-in compile() function, and then you finally create a module and initialize it with exec(). This is essentially what importlib does at a rudimentary level.

But import obviously goes beyond the rudimentary. There is bytecode to read and write, packages to deal with, warnings to raise, etc. And all of that requires code from some module in the standard library. But if you are trying to bootstrap in import w/o having a full-featured import, what do you do? You rely on built-in modules is what you do.

By using built-in modules you could have the VM inject any built-in module into the created importlib module and have it begin using it. Because of this I was curious as to what built-in modules CPython 3.1, Jython 2.5, and IronPython 2.6b2 had in common. The results are:
  • _codecs
  • _functools
  • _sre
  • _weakref
  • errno
  • gc
  • imp
  • sys
Not a whole lot. Importlib itself relies upon:
errno
Everyone has this.

io
IronPython's _bytesio probably has what I need (importlib only uses io.FileIO). Jython does not cover yet 2.6 so there is hope.

imp
Everyone has this.

marshal
This is actually optional (or at least I will make sure it is) as VMs do not need to implement pyc support.

posix/nt/os2
IronPython has this. Jython plans to have this in 2.6.

sys
Everyone has this.

warnings
Jython does not have a native implementation, but importlib only needs warnings.warn().

There is a partial overlap, but not a complete overlap. Luckily this is for Python 3 and thus there is hope that some of the things I need can be made common between the VMs in terms of what the built-in modules provide. It's possible that IronPython has everything already and Jython could add only what importlib needs (probably) w/o much issue.

Otherwise I am causing myself more pain than I need to and I should just not worry about the bootstrap and simply import code directly. Copying code from the 'os' module does get a little annoying after a while. =)

2011-06-28

My personal plans for Python 3.3

Now that my life has hit another purgatory moment (waiting to here when my wife's permanent residency interview will be and still in the ramp-up period at work), I have had some time to think about what my personal goals will be for Python 3.3. This list is in no particular order and I make no promises to actually do any of it as I only have roughly 249 days until alpha1. =)

2007-10-13

Importlib update

Importlib has now been (roughly) bootstrapped into Py3K in my py3k-importlib branch. There are some tests that are still failing that I have not tried to fix yet. I also have not removed any C code yet so there is still a chance of some dependency that I don't know about yet.

And there is still the issue of 'warnings' not being a built-in module. Neal Norwitz wrote an initial C version of the critical stuff, but it isn't complete. The biggest issue is that it doesn't pick up anything that is set on 'warnings' itself (e.g., 'filters', 'showwarnings'). That's bad as various attributes in 'warnings' get set externally frequently. My proposed solution is to see if 'warnings' has been imported, and if so use the attributes from there, otherwise fall back on internal C stuff. That way the module is entirely independent of Python code and thus doesn't cause me any bootstrap issues.

Once 'warnings' is built-in I will be able to rip out C code to make sure this whole thing actually works. Then I can fix any bugs I have for importlib itself. That should be it at that point and I should hopefully be able to move my work into Py3K itself.

2008-04-14

I hate PEP 263

PEP 263 is the one that allows one to specify the encoding of a Python source file. It is giving me such a headache with trying to bootstrap importlib. Because I must be able to open a source file with the proper encoding I must also have the entire codecs system working properly. This is a problem when the codecs system relies on imports to get to decoders.

Now the UTF-8 decoder is set up by default. But everything else, including ASCII, must be reachable through an import. That can be a problem when it is import that needs that module.

It is also a pain that one must open a file in a generic fashion, read the first two lines of a file, use a regex to try to find a specified encoding, and then reopen the file with the encoding found. That's a lot of stat calls and such that can be expensive. To deal with this and other bootstrapping issues I am going to have to expose some more C code in my special version of 3.0 in order to get this to work. Luckily it has partially been done for me thanks to imp.find_module() which does the proper file opening, albeit using the C import code. Once a pull it out to give me basically a custom source code open() function I should be able to move on to the next failing test.

Currently there are seven failing tests (albeit I can't run importlib properly anyway because of the PEP 263 issues). I have until early June to get this all done. Here is to hoping I can pull this off and properly delegate the stdlib reorg so that it all doesn't impact my personal life (which now includes looking for a new apartment for myself).

2007-06-14

My python-dev todo list is finally shrinking

It's nice to feel productive. =) I have been plugging away at my python-dev todo list when I can (usually while my girlfriend plays WoW which makes her addiction handy on occasion). At this point I just need to clear out my bug/patch backlog and look at Barry Warsaw's PEP 364 implementation. The tracker transition cannot happen until SF fixes their data dump issues so that is on hold (SF is working on it, though).

With those out of the way I will finally get to start on my personal project of bootstrapping importlib into Python. That should be a fun challenge.

I have removed (for now) finishing my pseudocode of import from the todo list. I just need to cover the Python bytecode/source dance, but I think most people have enough of a grasp of it that I doubt there is demand of me to finish it (but if I am wrong let me know in the comments).

I also started a "When Pigs Fly" list of big project ideas for when importlib bootstrap project is done. You can comment on the list if you want, but realize the title of the list has meaning; I have no clue when or if I will get to any of those projects. They are mostly there just to remind of what I have contemplated doing in order to suck down more free time. =)

2009-05-08

Month one of my python-dev sabbatical

It now has been roughly a month since I disabled subscriptions for practically every mailing list I belong to. Initially I had to get used to the fact that checking Gmail constantly for new mail was a fruitless exercise. Otherwise it's been nice.

I definitely have more time on my hands. Python-dev has been my hobby for so long I had somewhat forgotten what it was like to just work on a coding project purely for fun and just for myself. Not having to answer to the Python community for the quality of my code or test coverage, not having to document everything, or have to argue for why I did something is rather freeing. I can see why Neal Norwitz is on perpetual sabbatical from python-dev.

But I do miss being plugged in. Having to get my news through Twitter about what is going on still feels odd -- this was made especially true when PEP 383 was accepted and I didn't even know the PEP existed until people tweeted about the acceptance.

And I still think about my long-standing projects. I am thinking about how to handle the Mercurial transition and what workflow we should end up with. I still think about what I might need to implement in C for importlib in order to prevent people from lynching me when I bootstrap it in at the implementation of import. This means that once I am done defending my thesis proposal I will be coming back to python-dev.

But I suspect I won't rejoin every mailing list I disabled delivery for. And I also expect I will be much quicker on telling people to move over to python-ideas along with very liberally using the muting feature in Gmail. I have had a taste of freedom from senseless arguments and whining and I don't plan to go back to it without a fight.

One month down, two more to go.