I am moving my blogging over to Medium. My personal account will have both personal and technical posts. I have a collection just for my technical posts appropriately titled Coder Who Says Py. I have contacted the Planet Python admins to update that feed as well.
Hopefully the switch will lead to more blogging. =)
Coder Who Says Py
A place for me to babble on about Python development, Python itself, and coding in general. The title is inspired by some knights who enjoy a good shrubbery.
2013-07-13
2013-06-21
Collection of posts on writing a zip file importer in pure Python
To test out Medium I wrote a collection of posts on implementing a zip file importer using importlib. Do note that the order in the collection is reversed from how you should read it. You want to read path hooks, the finder, and then the loader.
2013-05-26
Practicality beats purity when it comes to backwards-compatibility
This past week I was on a 5 hour train ride where I was doing some API work on importlib to make it easier to customize imports. The route I have been taking to accomplish that is to define a couple key methods in the appropriate ABC so that subclasses can do as little custom work as possible and use sensible defaults as defined by the ABCs. That way gnarly details are taken care of in a consistent way by importlib and users can simply focus on what is different/special about their importers.
Part of that includes dealing with __path__. For those of you who might not be aware, the existence of __path__ on a module is the official definition of a package in Python. And if __path__ is defined it must be an iterable that only returns strings (if it returns anything at all; this all means [] is a legit value for __path__ to define that something is a package). Relatively straight-forward in the common case of some directory on your file system representing a package.
But what about situations like zipfiles? They are kind of like a file system, but not exactly; at least they have the innate concept of a path. Or how about a database that stores your code? In that situation a file path isn't really there unless you choose to make it happen; you could easily have a primary key of just module names and completely forgo the idea of paths. Since I'm trying to minimize the work anyone has to do for a custom importer I wanted to make sure that the only thing a user of importlib might have to do for __path__ is provide its custom value, I tried to come up with a way to allow that.
This all came up while I was designing a method to handle the setting of the various attributes on modules. As of right now that's all handled in importlib.abc.Loader.load_module() which is not a good point to do it at since you need to set it on a module and all load_module() takes is a name as a string (i.e. there is no hook in the API to tweak attributes on a module before it is executed). By providing a method that does the right thing using the pre-existing APIs allows for a convenient location to tweak values, such as __path__. The method is going to be called init_module_attrs() and will set things like __loader__, __package__, __file__, __cached__, and hopefully __path__ (the trigger for this post).
As it stands now, the only API for packages in importlib is importlib.abc.InspectLoader.is_package() on loaders which I inherited from PEP 302. The problem with this method is that all is_package() does is return a boolean value to say whether something is a package or not. There was no provided API to return what __path__ should be. Basically there was no hook point in the API for overloading a function to provide __path__ for a module until after import which is not where you want it to happen.
My initial reaction was, "I'll define a new method!" From an API perspective that the purest solution. I could define a new method called package_paths() that either returns what __path__ should be set to or even subsume is_package() by returning None when a module isn't a package and all other values signify a package and what __path__ should be (this is to allow [] as a value for __path__). Nice and straight-forward.
But upon reflecting on the idea, I realized it wasn't practical. While databases and such might not need to rely on paths, in the massive majority of cases __path__ will end up with at least path-like entries (if for any other reason than most code that mucks with __path__ just assumes it can use os.path), so that suggests using what importlib.abc.ExecutionLoader.get_filename() returns is reasonable. There is also the point that users would need to define this new method just to get any help with setting __path__ which is always a problem since code would only be usable in 3.4 without users writing their own shim for 3.3 and earlier. All of this led to me to the realization that it would be better to just set __path__ in init_module_attrs() as best as I could (e.g. os.path.dirname(self.get_filename())or []) and that if the default didn't work then people could override the method, call super().init_module_attrs(), and then reset __path__ to what they want. That still provides a way to make overriding simple while still giving a reasonable default for the majority of cases. Practicality beats purity, especially when you are trying to provide a solution that works in a backwards-compatible fashion.
And just to really hit this point home, I had another "brilliant" idea that I decided was too much of me thinking about API purity and not what was practical. With the advent of __package__, an alternative way to tell if a module is a package is to see if __name__ == __package__. That's nice and simple and is a clearer definition since both of those attributes always exist (at least starting in Python 3.3) and it doesn't require having a dummy empty list entry on packages where __path__ must exist but lacks any other reasonable value (which is what I was trying to avoid since you don't see an empty list for __path__ very often and I didn't want to confuse users by doing that or that __path__ had to have a true value). Good idea I thought!
Except in a practical sense, there is too much code out there relying on __path__ being the way to tell if a module is a package. Now making older code work with this __name__ == __package__ idea wouldn't be hard since it's a check for that and if __package__ isn't set or is None then fall back on looking for __path__. I could even provide an importlib.util.is_package() whose code could be copied easily for code that needed to stay compatible with 3.3 and earlier. But what benefit would it get me? Not confusing users when they see an empty list for __path__? That's pretty minor since the vast majority of code should be doing a hasattr(module, '__path__') check to tell if something is a package instead of checking for the attribute and making sure the value is true. Plus how much code ever changes __path__, let alone even cares if it exists? My bet is hardly anyone cares, and so it would be better to not try to clean up the semantics to be more straight-forward and easier to explain for all future versions of Python how to tell if a module is a package and just stick with what is seemingly working now for all pre-existing code when the status quo isn't that bad.
So practicality beats purity when dealing with backwards-compatibility. When the itch strikes you to expand an API to try to have a more pure solution to something, take a moment before you scratch that itch and see if there is a more backwards-compatible way that might not be as pure or simple, but still allows for the same outcome. This has happened to me more than once with importlib which is why I always mull over any API changes for at least a week before actually committing any code.
Part of that includes dealing with __path__. For those of you who might not be aware, the existence of __path__ on a module is the official definition of a package in Python. And if __path__ is defined it must be an iterable that only returns strings (if it returns anything at all; this all means [] is a legit value for __path__ to define that something is a package). Relatively straight-forward in the common case of some directory on your file system representing a package.
But what about situations like zipfiles? They are kind of like a file system, but not exactly; at least they have the innate concept of a path. Or how about a database that stores your code? In that situation a file path isn't really there unless you choose to make it happen; you could easily have a primary key of just module names and completely forgo the idea of paths. Since I'm trying to minimize the work anyone has to do for a custom importer I wanted to make sure that the only thing a user of importlib might have to do for __path__ is provide its custom value, I tried to come up with a way to allow that.
This all came up while I was designing a method to handle the setting of the various attributes on modules. As of right now that's all handled in importlib.abc.Loader.load_module() which is not a good point to do it at since you need to set it on a module and all load_module() takes is a name as a string (i.e. there is no hook in the API to tweak attributes on a module before it is executed). By providing a method that does the right thing using the pre-existing APIs allows for a convenient location to tweak values, such as __path__. The method is going to be called init_module_attrs() and will set things like __loader__, __package__, __file__, __cached__, and hopefully __path__ (the trigger for this post).
As it stands now, the only API for packages in importlib is importlib.abc.InspectLoader.is_package() on loaders which I inherited from PEP 302. The problem with this method is that all is_package() does is return a boolean value to say whether something is a package or not. There was no provided API to return what __path__ should be. Basically there was no hook point in the API for overloading a function to provide __path__ for a module until after import which is not where you want it to happen.
My initial reaction was, "I'll define a new method!" From an API perspective that the purest solution. I could define a new method called package_paths() that either returns what __path__ should be set to or even subsume is_package() by returning None when a module isn't a package and all other values signify a package and what __path__ should be (this is to allow [] as a value for __path__). Nice and straight-forward.
But upon reflecting on the idea, I realized it wasn't practical. While databases and such might not need to rely on paths, in the massive majority of cases __path__ will end up with at least path-like entries (if for any other reason than most code that mucks with __path__ just assumes it can use os.path), so that suggests using what importlib.abc.ExecutionLoader.get_filename() returns is reasonable. There is also the point that users would need to define this new method just to get any help with setting __path__ which is always a problem since code would only be usable in 3.4 without users writing their own shim for 3.3 and earlier. All of this led to me to the realization that it would be better to just set __path__ in init_module_attrs() as best as I could (e.g. os.path.dirname(self.get_filename())or []) and that if the default didn't work then people could override the method, call super().init_module_attrs(), and then reset __path__ to what they want. That still provides a way to make overriding simple while still giving a reasonable default for the majority of cases. Practicality beats purity, especially when you are trying to provide a solution that works in a backwards-compatible fashion.
And just to really hit this point home, I had another "brilliant" idea that I decided was too much of me thinking about API purity and not what was practical. With the advent of __package__, an alternative way to tell if a module is a package is to see if __name__ == __package__. That's nice and simple and is a clearer definition since both of those attributes always exist (at least starting in Python 3.3) and it doesn't require having a dummy empty list entry on packages where __path__ must exist but lacks any other reasonable value (which is what I was trying to avoid since you don't see an empty list for __path__ very often and I didn't want to confuse users by doing that or that __path__ had to have a true value). Good idea I thought!
Except in a practical sense, there is too much code out there relying on __path__ being the way to tell if a module is a package. Now making older code work with this __name__ == __package__ idea wouldn't be hard since it's a check for that and if __package__ isn't set or is None then fall back on looking for __path__. I could even provide an importlib.util.is_package() whose code could be copied easily for code that needed to stay compatible with 3.3 and earlier. But what benefit would it get me? Not confusing users when they see an empty list for __path__? That's pretty minor since the vast majority of code should be doing a hasattr(module, '__path__') check to tell if something is a package instead of checking for the attribute and making sure the value is true. Plus how much code ever changes __path__, let alone even cares if it exists? My bet is hardly anyone cares, and so it would be better to not try to clean up the semantics to be more straight-forward and easier to explain for all future versions of Python how to tell if a module is a package and just stick with what is seemingly working now for all pre-existing code when the status quo isn't that bad.
So practicality beats purity when dealing with backwards-compatibility. When the itch strikes you to expand an API to try to have a more pure solution to something, take a moment before you scratch that itch and see if there is a more backwards-compatible way that might not be as pure or simple, but still allows for the same outcome. This has happened to me more than once with importlib which is why I always mull over any API changes for at least a week before actually committing any code.
2013-04-18
A decade of commits
Today marks a decade since I made my first commit to CPython's repository on Sat, 19 Apr 2003 04:00:56 +0000 (python-checkins, hg.python.org). According to Ohloh, I currently sit as the 16th most prolific committers based on commit count which I can hardly believe. Boy have times changed over the past decade!
Back in April 2003, we were still on CVS on SourceForge (I somewhat foolishly took on projects to change both of those). Guido gave me my commit privileges himself (now I hand them out which is a bit scary =). It was less than a month after the first PyCon (or at least the first Python conference officially called PyCon, and I've managed to attend every single since and now have my wife asking if she can come) and me being elected to the Python Software Foundation (which I joined the board of directors for a time). This was before Python 2.3 we released (and now we are working on Python 3.4). Back then, Python was becoming popular and had an upward trend, heading towards its current position as the top dynamic language out there that isn't embedded in a browser (I would say it didn't really become really obvious this was going to happen until about 2005, so I got my wagon hitched at just the right time =).
But this post is not about reminiscing. It's for thanking the people and community who have made contributing to Python so enjoyable that I have actually wanted to do it for a whole decade (and will continue to do so for the foreseeable future).
I want to first thank python-dev. I have always said I truly learned how to program from my fellow core developers. Getting to work on CPython's interpreter core and the stdlib showed me how to manage complexity in APIs, keep my code clean and readable, when to optimize and when to go with the easier to read solution, etc. Pretty much everything that you would want to know when programming in the wild I didn't learn from a class or a book but from my fellow open source programmers. You just can't buy that experience. This is the reason I have always done what I could to make the lives of people who wanted to contribute as easy as possible (sometimes at the expense of other core devs depending on how you fall down on the svn -> hg transition).
I also want to thank the Python community. When I first started contributing I was doing it to gain experience in programming in the real world by contributing to a top-notch codebase with world-class programmers. But as time went on the things I gained in terms of experience dwindled. But what I lost in terms of fulfillment from what I learned was more than made up for in terms of the interactions I had with the community. Meeting people who have benefited from my code and said "thanks" for volunteering my time truly does inspire me to keep contributing, especially when I don't want to backport a bug fix. =)
But through the community I have also been able to gain great friends from across the globe. While I may only get to see my "open source friends" about once a year for a week at PyCon (which is the key reason I look forward to the conference as soon as the last one finished), they are truly friends. They are people I would let crash in my spare bedroom, give them a key, and say "welcome" without hesitation (if any of them were ever so inclined to visit Toronto, let alone Guelph). Those friendships are truly important to me and what will keep me coming back for years on in the future no matter how much or little I am able to contribute in my spare time.
So thanks to everyone reading this. By being a part of this great community of nice, caring individuals I continue to come back to contribute and participate however I can.
Back in April 2003, we were still on CVS on SourceForge (I somewhat foolishly took on projects to change both of those). Guido gave me my commit privileges himself (now I hand them out which is a bit scary =). It was less than a month after the first PyCon (or at least the first Python conference officially called PyCon, and I've managed to attend every single since and now have my wife asking if she can come) and me being elected to the Python Software Foundation (which I joined the board of directors for a time). This was before Python 2.3 we released (and now we are working on Python 3.4). Back then, Python was becoming popular and had an upward trend, heading towards its current position as the top dynamic language out there that isn't embedded in a browser (I would say it didn't really become really obvious this was going to happen until about 2005, so I got my wagon hitched at just the right time =).
But this post is not about reminiscing. It's for thanking the people and community who have made contributing to Python so enjoyable that I have actually wanted to do it for a whole decade (and will continue to do so for the foreseeable future).
I want to first thank python-dev. I have always said I truly learned how to program from my fellow core developers. Getting to work on CPython's interpreter core and the stdlib showed me how to manage complexity in APIs, keep my code clean and readable, when to optimize and when to go with the easier to read solution, etc. Pretty much everything that you would want to know when programming in the wild I didn't learn from a class or a book but from my fellow open source programmers. You just can't buy that experience. This is the reason I have always done what I could to make the lives of people who wanted to contribute as easy as possible (sometimes at the expense of other core devs depending on how you fall down on the svn -> hg transition).
I also want to thank the Python community. When I first started contributing I was doing it to gain experience in programming in the real world by contributing to a top-notch codebase with world-class programmers. But as time went on the things I gained in terms of experience dwindled. But what I lost in terms of fulfillment from what I learned was more than made up for in terms of the interactions I had with the community. Meeting people who have benefited from my code and said "thanks" for volunteering my time truly does inspire me to keep contributing, especially when I don't want to backport a bug fix. =)
But through the community I have also been able to gain great friends from across the globe. While I may only get to see my "open source friends" about once a year for a week at PyCon (which is the key reason I look forward to the conference as soon as the last one finished), they are truly friends. They are people I would let crash in my spare bedroom, give them a key, and say "welcome" without hesitation (if any of them were ever so inclined to visit Toronto, let alone Guelph). Those friendships are truly important to me and what will keep me coming back for years on in the future no matter how much or little I am able to contribute in my spare time.
So thanks to everyone reading this. By being a part of this great community of nice, caring individuals I continue to come back to contribute and participate however I can.
2013-04-12
Why I'm signing up for Gittip
While at PyCon I heard about plans to integrate Gittip into rubygems.org and so I decided to have another look at Gittip. For those that don't know, it's a website where you can give and/or receive money to/from others on a weekly basis. You can give as little as $0.25/week ($13/year) up to $24/week ($1,248/year). Being on a weekly schedule allows you to say "I appreciate the time and effort you put into open source; keep it up!", compared to bounties which are goal-specific and don't recognize people who make contributions that have no direct financial benefit or make contributions year-round.
As I was poking around the site I noticed that my friend Jesse Noller was the top recipient. I read his page on Gittip which listed his vast accomplishments that he has made in his spare time for no pay beyond gratitude from others and any feeling of accomplishment his hard work gives him. But the other thing his Gittip page mentions is what receiving tips means for him.
It basically boils down to a way for people to thank his family for letting him do his open source work. That sentiment really struck a cord with me. Like most open source contributors, I do it because I derive some enjoyment from it. It's a feeling of accomplishment, it's the camaraderie with my various friends that I have in the Python community, etc. In other words it's all very intangible but I do get something from doing my open source work.
But my family doesn't get any of the benefit that I get. Since I am not paid to do my open source work I need to take personal time to do it. That means I have to take time away from my wife to do this rather solitary work of contributing to open source. While my wife understands why I do what I do for Python, her benefit of getting to be proud of me is indirect and very diluted compared to what I get from it (although she is starting to increase her participation by attending PyCon).
But having people express gratitude through Gittip gives more direct benefit to one's family. When I asked on Twitter and Google+ for people to tip Jesse to thank him for all that he does through the year (and especially for PyCon in the past two years), he got a nice bump in his tips, and so he was able to take his daughter and family out bowling that night.
Tips then are a way for the community to thank someone's family for letting them share their loved one with open source. For instance, tips for me would be a way of thanking my wife for letting me spend the hours I do contributing to Python in my various ways by letting me treat my wife to a night out so neither of us has to cook. It also doesn't hurt that it acts like a small form of blackmail; "yes, Andrea, I do need to get this patch in and you should let me put the time in since the Python community treated you to a nice dinner last night" =) .
All of these reasons are why I'm joining Gittip. It has actually now reached the point of legitimacy to have Heroku as a company start leaving tips and Read the Docs is trying to pay for various expenses through Gittip; it's no longer just a bunch of individuals. So please consider signing up to both receive and send tips if you have the financial means to thank those in open source for their diligent and hard work.
As I was poking around the site I noticed that my friend Jesse Noller was the top recipient. I read his page on Gittip which listed his vast accomplishments that he has made in his spare time for no pay beyond gratitude from others and any feeling of accomplishment his hard work gives him. But the other thing his Gittip page mentions is what receiving tips means for him.
It basically boils down to a way for people to thank his family for letting him do his open source work. That sentiment really struck a cord with me. Like most open source contributors, I do it because I derive some enjoyment from it. It's a feeling of accomplishment, it's the camaraderie with my various friends that I have in the Python community, etc. In other words it's all very intangible but I do get something from doing my open source work.
But my family doesn't get any of the benefit that I get. Since I am not paid to do my open source work I need to take personal time to do it. That means I have to take time away from my wife to do this rather solitary work of contributing to open source. While my wife understands why I do what I do for Python, her benefit of getting to be proud of me is indirect and very diluted compared to what I get from it (although she is starting to increase her participation by attending PyCon).
But having people express gratitude through Gittip gives more direct benefit to one's family. When I asked on Twitter and Google+ for people to tip Jesse to thank him for all that he does through the year (and especially for PyCon in the past two years), he got a nice bump in his tips, and so he was able to take his daughter and family out bowling that night.
Tips then are a way for the community to thank someone's family for letting them share their loved one with open source. For instance, tips for me would be a way of thanking my wife for letting me spend the hours I do contributing to Python in my various ways by letting me treat my wife to a night out so neither of us has to cook. It also doesn't hurt that it acts like a small form of blackmail; "yes, Andrea, I do need to get this patch in and you should let me put the time in since the Python community treated you to a nice dinner last night" =) .
All of these reasons are why I'm joining Gittip. It has actually now reached the point of legitimacy to have Heroku as a company start leaving tips and Read the Docs is trying to pay for various expenses through Gittip; it's no longer just a bunch of individuals. So please consider signing up to both receive and send tips if you have the financial means to thank those in open source for their diligent and hard work.
2013-03-21
PyCon 2013 report
PyCon 2013 is now over and it was awesome (as usual)! As seems to happen every year, there were a few themes at the conference.
Packaging
For those of you who don't know, people are giving it another go to try and straighten out packaging in the Python world. The difference compared to the previous attempt is that Nick Coghlan, who is leading this endeavour, is working directly with pre-existing tools to gain consensus on things instead of trying to get the stdlib to handle it all. This means, for instance, he is working with the installer projects (e.g. pip) to agree on what should (and should not) happen in the evolution of packaging. This seems to have done a good job in energizing key people into supporting Nick's overall view (more on that later).
This means the stdlib is not going to try and solve all problems. The current thinking seems to be that the stdlib should house modules for which PEPs exist and then tools are to be built on top of that. This allows for all tools to act on metadata and such in a uniform way, letting them innovate on higher-level details (and keep the stdlib out of the installer game). Think PEPs 425, 426, and 427 details being handled by distlib.
He is also working from the top down on the stack. This means installer now, build-related stuff later. This has the nice benefit that the thing that most people directly interact with the most should get fixed first, rather than worrying about behind-the-scenes details later.
What does all of this mean? Eventually people will be able to get an installer, be able to securely install from the Cheeseshop (or any other package index of their choosing), and have it all bootstrap up on their system easily. A proposal (PEP 439) even went out this week to basically include a pip bootstrap script in Python which will install the real pip if it has not already happened and then continue on with the installation, making it all seamless. You can follow the discussion of this specific proposal on distutils-sig.
If all of this interests you I suggest you watch the packaging panel when the video goes up.
Python 3.3
I gave my Python 3.3 > Python 2.7 talk again (video here; PyCon Argentina video here although I think I like the US one more) where basically I pointed out all the wonderful features of Python 3.3 and that performance-wise you don't have to care which version you use (unless you have memory issues in which case you will want to use Python 3.3). I honestly was expecting some pushback since I have become a little jaded over the past 4+ years of Python 3's existence. But you know what? No pushback at all (but maybe it's because Armin wasn't there this year =). It was a really nice change of pace to not have to defend something I believe in and have worked hard to foster.
I heard numerous people tell me that they had finally been able to start using Python 3 and that they really enjoyed it. Jacob Kaplan-Moss of Django fame gave a talk on porting Django apps to Python 3 (no video yet) and told me that he not only liked Python 3, but that the no-argument version of super() made him "irrationally excited". David Beazley said that since he wrote the 3rd edition of the Python Cookbook for Python 3.3 he finds Python 2.7 a bit painful to use. It continues to be the case that almost everyone who gives Python 3 a fair shake ends up really liking it.
Diversity & Outreach
Watch Jesse Noller's opening statements. Then watch Eben Upton's keynote. Then realize that 20% of attendees were women. Then realize there was also a Raspberry Pi programming class for kids. Then really make sure you watch Jesse's opening statements if you ignored that initial link. Makes me want to be a better person and try to help people even more.
Everything else
I gave my "How Import Works" talk (US video here, Argentina video here and this time I prefer the latter thanks to having more time and thus feeling more relaxed).
The language summit happened. You can find numerous other summaries of what happend out there (Nick Coghlan, Kushal Das), so I won't rehash it here.
I wasn't able to stick around for the sprints this year (first time since the founding of the conference) past half of the first day. But hopefully next year I will be able to make it work out.
As I said, overall it was a great conference. Thanks to Jesse and everyone else who volunteered to help make it a great week.
2013-02-17
Resolving a TOOWTDI interface problem for attributes
TL;DR: choose one way to signify the lack of information in an API either as making the attribute optional or setting a default value (e.g. None), but not both.
When you read the docs for importlib.find_loader() you find that an exception is raised when __loader__ is set to None. But if you read the docs for importlib.abc.Loader.load_module() you will notice that __loader__ "should be set" (italics mine). So one part of the docs says having a value of None is fine while another says the attribute doesn't even have to exist. So the former is a LBYL version of the API while the latter is a EAFP version. While that's technically fine, I do like the concept of TOOWTDI in Python, so I would prefer choosing one of the approaches as the definitive way to signal that a module's loader is not known.
Does long-term (think in timescales of years) backwards-compatibility suggest a preference of one over the other? As it stands now, one must do:
That handles both the LBYL and EAFP approaches of either not setting the attribute or setting it to None. If this were to translate to LBYL it would become:
Not a huge difference, just easier to read. The EAFP approach would be:
try:
loader = module.__loader__
except AttributeError:
pass
else:
# Use loader
But since most code that cares whether __loader__ is set already uses the getattr() approach, the None value approach is the least disruptive to changing to the eventual idiom.
But the thing that tipped the scales for me is I don't want the attribute to be optional but be required in the long run (think Python 4 long run; side-effect of how long Python versions last), so I plan to change the default attributes on the module type to always have __loader__ and __package__ and set them to None by default in Python 3.4. That means the optional approach won't mean anything going forward, so that makes the LBYL approach the one I plan to go with even if I personally prefer the EAFP approach for optional API attributes; I don't want this part of the API being viewed as optional by loader authors.
If you care about any of this specific API cleanup, you can follow issue #17115 as I clean up importlib's mixed approach to __loader__.
When you read the docs for importlib.find_loader() you find that an exception is raised when __loader__ is set to None. But if you read the docs for importlib.abc.Loader.load_module() you will notice that __loader__ "should be set" (italics mine). So one part of the docs says having a value of None is fine while another says the attribute doesn't even have to exist. So the former is a LBYL version of the API while the latter is a EAFP version. While that's technically fine, I do like the concept of TOOWTDI in Python, so I would prefer choosing one of the approaches as the definitive way to signal that a module's loader is not known.
Does long-term (think in timescales of years) backwards-compatibility suggest a preference of one over the other? As it stands now, one must do:
if getattr(module, '__loader__', None) is not None:
# Use loader
That handles both the LBYL and EAFP approaches of either not setting the attribute or setting it to None. If this were to translate to LBYL it would become:
if module.__loader__ is not None:
# Use loader
Not a huge difference, just easier to read. The EAFP approach would be:
try:
loader = module.__loader__
except AttributeError:
pass
else:
# Use loader
Longer, but still totally readable and psychologically makes more sense since the attribute is set more often than not (importlib actually sets the attribute along with __package__ after importing if they are not already set).But since most code that cares whether __loader__ is set already uses the getattr() approach, the None value approach is the least disruptive to changing to the eventual idiom.
But the thing that tipped the scales for me is I don't want the attribute to be optional but be required in the long run (think Python 4 long run; side-effect of how long Python versions last), so I plan to change the default attributes on the module type to always have __loader__ and __package__ and set them to None by default in Python 3.4. That means the optional approach won't mean anything going forward, so that makes the LBYL approach the one I plan to go with even if I personally prefer the EAFP approach for optional API attributes; I don't want this part of the API being viewed as optional by loader authors.
If you care about any of this specific API cleanup, you can follow issue #17115 as I clean up importlib's mixed approach to __loader__.
2013-02-01
Remember that the "BC" in ABC means "Base Class"
[UPDATE: had a talk with +Thomas Wouters on IM and has caused me to rethink things. New thoughts up top, original post after the jump break]
I had mis-heard a comment Thomas Wouters made in a meeting about not raising NotImplementedError and using ABCs, which led to me thinking about the problem of having ABCs in your MRO which were not at the bottom and defined methods which would override methods you wanted to access farther down the inheritance chain. I had thought that calling super() in your ABCs in some manner was the solution. But after discussing things with Thomas I believe I was in the wrong and I had badly misheard what he had said. =)
Because importlib has a bunch of overlapping ABCs which inherit from each other I thought that the situation might come up where you inherited from two different classes which had overlap in methods but for which you would want them to build off of each other. But as Thomas pointed out to me, ABCs are meant to be at the bottom of an MRO; the BC stands for "Base Class" for a reason. If you are trying to interleave methods between two different classes implementing the same interface then either the granularity of the ABC is wrong or you shouldn't be inheriting from the ABC to begin with. I tried to come up with counter-examples but they were so convoluted and leading to bad API design in order to justify that I gave up and admitted they were stupid.
What does all of this mean for you when you are writing an ABC? You should just treat your ABCs as the bottom of your MRO. That means you should have all of your methods, even the abstract ones, do something sensible in case they are somewhat blindly reached through a super() call. If you have a default return value, return that. If that does not exist then you should raise the exception which signifies failure as defined by the API. But raising NotImplementedError is not the right thing to do when it can be avoided with a sensible default reaction (which I have not been doing in importlib.abc). This also has a nice side benefit of making sure you clearly define what the default reaction is for a call to the method.
I had mis-heard a comment Thomas Wouters made in a meeting about not raising NotImplementedError and using ABCs, which led to me thinking about the problem of having ABCs in your MRO which were not at the bottom and defined methods which would override methods you wanted to access farther down the inheritance chain. I had thought that calling super() in your ABCs in some manner was the solution. But after discussing things with Thomas I believe I was in the wrong and I had badly misheard what he had said. =)
Because importlib has a bunch of overlapping ABCs which inherit from each other I thought that the situation might come up where you inherited from two different classes which had overlap in methods but for which you would want them to build off of each other. But as Thomas pointed out to me, ABCs are meant to be at the bottom of an MRO; the BC stands for "Base Class" for a reason. If you are trying to interleave methods between two different classes implementing the same interface then either the granularity of the ABC is wrong or you shouldn't be inheriting from the ABC to begin with. I tried to come up with counter-examples but they were so convoluted and leading to bad API design in order to justify that I gave up and admitted they were stupid.
What does all of this mean for you when you are writing an ABC? You should just treat your ABCs as the bottom of your MRO. That means you should have all of your methods, even the abstract ones, do something sensible in case they are somewhat blindly reached through a super() call. If you have a default return value, return that. If that does not exist then you should raise the exception which signifies failure as defined by the API. But raising NotImplementedError is not the right thing to do when it can be avoided with a sensible default reaction (which I have not been doing in importlib.abc). This also has a nice side benefit of making sure you clearly define what the default reaction is for a call to the method.
2012-12-09
How much of Python can be written in Python?
Now I don't mean in the PyPy sense where you can bootstrap yourself with another Python installation. No, I'm talking about all you have is a checkout of the CPython repository and a C compiler. How far could you go in writing stuff for Python in Python and not C (from my perspective, for maintainability, for others perhaps ease of extensibility). In Python 3.3 we now have import written in Python (technically the main import loop that is used is implemented in C to save 5% at startup, but that is entirely optional as equivalent pure Python code still exists) and it's actually faster than the C version from Python 3.2 thanks to directory content caching. So it is not entirely ridiculous to think about how far one could push the idea of replacing C code in CPython with Python code.
What restrictions do we have for this thought experiment? One is that CPython needs to continue to be performant. That means either that the feature is not executed constantly or can be made to work as close to C code as possible. The other requirement is that it can't really have dependencies on the stdlib beyond built-in modules. Since this concept works based on freezing Python bytecode into C-level char arrays you don't want to have to pull in half the stdlib just to make something work. But that's pretty much it.
The first possibility is the parser. If you either generated the parser like the one CPython uses (that has not really changed much since Guido wrote it way back when) or wrote a recursive descent one by hand, it could probably be written in Python. The real problem is how performance might be hit. Now if you are working off of bytecode files then this really is only a one-time cost per bytecode file creation. But if you are working primarily with modules that you specify on the command line then they get parsed every time you invoke the interpreter and that could be costly if you can't get performance to be good enough.
Going down the compiler chain, you could also go from CST (concrete syntax tree) to AST (abstract syntax tree) in pure Python. You can already get to the CST from the parser module, so the work to expose the CST at the Python level is done. And with the ast module already exposed it then becomes a matter of creating the AST nodes from the CST. But once again, it's a question of performance since this is invoked every time source code is compiled.
Next would be transforming the AST to bytecode. The AST is already exposed to Python code, so once again the initial work is done for access. But also once again there is the question of performance as this is also on the critical path if you continually compiling Python source code because you are executing scripts instead of importing code which was previously stored as a bytecode file.
You can't do anything for the interpreter eval loop as that becomes a bootstrap issue. If you really wanted to push this you could do a basic eval loop to bootstrap a more complex one, but that seems like more work than it's worth.
I suspect most of Python's builtins could be re-implemented in pure Python without any trouble. Re-implementing something like any(), map(), etc. is not exactly difficult. In this instance, though, performance definitely becomes a key issue due to the extensive use of builtin functions. And in the case of exceptions you have to worry about the C API surrounding them on top of any possible performance issue from exception raising (although I'm willing to bet this can easily be alleviated by just caching at the interpreter level the builtin exception classes so that at the C level it's still just PyObject pointers instead of having to extract them dynamically every time from the builtin module).
And as always every single module in the stdlib does not have to be implemented in C code if it doesn't wrap other C code. In that instance it is simply taking the time to either copy over and get working the pure Python versions of modules that other VMs have written or writing one from scratch. But thanks to PEP 399 this is only an issue for pre-existing modules (which is also why no one has bothered to backfill all of those modules as the other VMs have already done the work for themselves so no one really needs this to happen; I opened issue 16651 to find out exactly what modules don't have a pure Python version).
In other words, there are various possibilities for technically writing more of CPython in pure Python exists, but performance considerations will quite possibly not make it worth pursuing (but I would be quite happy if proved wrong =).
2012-05-13
My (very shallow) thoughts on Dart
Being the language nerd that I am, I actually find it fun to learn new programming languages. Now typically this is nothing more than me reading all of the official documentation and writing some toy examples that give me a very shallow, quick-and-dirty feel for a language. Since I have been involved in language design for nearly a decade (started participating on python-dev in June 2002) and have done toy examples now in 18 languages (17 actually still run; I have never bothered to get Forth to work again after a gforth change broke my code), this is actually usually enough for me to grasp the inspirations for a language and thus understand its essence.
At work I have been doing some JavaScript work for an internal Chrome extension and dashboard and so that led me to want to look into what Dart had to offer over JavaScript. I know the language is only at version 0.09 (and still changing weekly), but the fundamentals are there so I wanted to see what the general feel of the language is (and will continue to be).
I also know Dart is somewhat controversial for some people. Personally, I fall on the "competition is good" side of the argument, not the "OMG fragmentation" side. I want ECMAScript Harmony to still happen and give me a cleaner, tighter, more functional JavaScript, but that doesn't mean Dart doesn't have a place in the world as a cleaner OO language for the web. Besides, me thinking otherwise would make me a massive hypocrite as I began working on Python before it was cool (I feel like I need a hipster meme for that statement, but I digress) and I have worked hard to convert people to Python from other languages. Hell, I have tried to foster competition between the Python VMs to get them to push each other to perform better and be ever more interoperable. IOW I don't totally buy this fragmentation argument.
Going into learning Dart I knew who was involved with the language which is what will inherently define how a language feels. I knew Lars Bak of V8 helped design the language, which meant it would have some design restrictions put on it to make it have a damn fast VM. Josh Bloch has been helping to design Dart's library which meant some JDK feel to it. I also know Jim Hugunin is involved which should also help with the VM speed. So fast with an API designed like the JDK.
What did I find? A language with a damn fast VM and a standard library that felt like the JDK. =) Take OO as a Python programmer would expect (e.g. pure OO where everything is an object, not dogmatic OO like Java where everything has to be in an class definition), make types entirely optional for testing and tooling purposes but enough support to use interfaces and generics, and then toss in abilities based on what JavaScript allows and then you have a good idea of what Dart offers.
So, Dart has optional typing. In case you have not heard, Dart does not use type information at runtime for performance and only throws any form of fit if a type doesn't match what is specified unless you run in checked mode. If you do that then you get warnings about possible type issues. But Dart's type system is unsound so don't expect typing to catch every error that a more strict type system might even when you run in checked mode. Dart views types as helpful documentation and a way to help tools assist with things, period. I actually find it rather refreshing to have a language that treats types as just documentation since that is really what they are for the programmer (VMs can use it for performance, but it isn't required for good performance and type safety only saves you from a minor set of bugs which every Python programmer probably realizes eventually =).
But that's even if you bother with types! You can write all of your code without types and everything will run without issue. Even generics are optional, so you can declare a function accepts a List or List ; Dart doesn't care either way and it alleviates covariance/contravariance headaches by not caring if you don't care either. It's actually rather nice to have non-library code be written quickly using dynamic typing and only add in the type information for library code where you care about what interface is expected. IOW I think Dart strike a nice balance with how it does typing and I actually feel fine using types when I know what I expect to accept in my own code that I don't expect anyone else to rely upon.
Dart is OO, not prototypical like JavaScript. It's single-inheritance, which I'm fine with. It does have interfaces as one would expect in a statically typed language, but it softens their expense by allowing one to define a default implementation of an interface. What this means is that the Map interface will also give you a HashMap instance if you call new Map(). I suspect they snagged the idea from Scala where you have the Map class which hides HashMap from the user if you simply don't care about what Map implementation you use.
It does have a modicum of privacy by using a leading underscore for signaling something is private, much like Python. But the privacy is enforced at the library-level or is public, period. Every field automatically has a getter and setter defined for them, so there is no way to force a private field (which I think is a good thing since I find private privacy bloody annoying). I also like that getters and setters are directly supported by the language with automatic generation show you don't ever have to see a setSomething()/getSomething() function call just to read/write a field, but you can do something like Python's properties very easily.
The standard libraries are fine and just feel like the JDK. Things are very much LBYL rather than EAFP. I am willing to bet (although I have not tested this) that exceptions are a little expensive in Dart (since exceptions are hard to optimize) and so they would rather go the LBYL way. But they still went a little overboard in my opinion on some things (e.g. the list interface has a last() method instead of supporting negative indexes). But there is nothing there that is making me run away screaming.
One place I do think Dart could use some improvement is simplifying their constructor rules. Upfront Dart has some nice syntactic sugar for a construction where you directly specify how a constructor's arguments map to instance fields, avoiding having to declare the constructor parameters and then also write an assignment. OK, I like that.
Dart also has initializer lists which let you initialize final fields. OK, that's cool and a nice idea taken from C++.
Constructors are not inherited. OK, that's fine since you probably want to be explicit about how you tweak stuff. But there is an exception about the default, no-argument constructor calling the superclass' no-argument constructor. So while not technically inherited, it might as well be in that single instance. And all defined constructors will automatically call the default constructor, which if it isn't defined you must explicitly call a constructor somehow (probably in the initializer list of your constructor). Um, OK...
And you have named constructors. This gets you around from the lack of type-based method overloading for constructors. OK, I can go with that.
You also have constant constructors since fields can only be initialized to compile-constant values. Fine, that's for performance and determinism in instance creation, so I can grasp the desire for that.
And then you have factory constructors. OK, this is where I go "WTF people". This is so that you can have a constructor that actually doesn't create a new instance but instead can return something else other than a new instance (think of Python's __new__() or any of Java's static factory methods). But this lets you use the new keyword on a factory constructor instead of using a static method. And that to me seems unneeded.
So lets recap what constructor options we have. We have regular constructors, default and defined, which supports initialize lists. You have named constructors. There are constant constructors. And you also have factory constructors. If you don't count the default constructor as special that means Dart has four types of constructors. WTF!?! I realize that Java's FactoryFactoryOfFactories crap has probably spooked the crap out of the Dart designers, all the while having Java influences making them think they need the new keyword for anything that would return an instance of a class, but this seems a bit much. Dart's function definitions are rich enough to allow for optional arguments, etc. which would suggest that the typical constructor can do the job of named constructors with static methods picking up the slack where absolutely necessary where factory constructors are used. Maybe I'm missing something here, but I think they tried to design for everything that is bad about Java's constructor mess without stopping to think what their function definitions already buy them, all while making sure the new keyword was used.
Luckily that is the only bit of Dart that I found poorly designed. Everything else is reasonable and something any JavaScript programmer will be somewhat familiar with or quickly grasp.
Now as I said, I only did toy examples in Dart beyond reading the docs from beginning to end. If I had more time this weekend I may have done one more coding example that was more involved, but I ran out of time. But based on what I have read and what I learned, I am happy with Dart and would be content in using it for programming for the Internet. I would also be totally happy being asked to use it in a situation where others wanted to use types (e.g. I would be fine ditching Java for Dart if people really felt the need to hold on to their types).
At work I have been doing some JavaScript work for an internal Chrome extension and dashboard and so that led me to want to look into what Dart had to offer over JavaScript. I know the language is only at version 0.09 (and still changing weekly), but the fundamentals are there so I wanted to see what the general feel of the language is (and will continue to be).
I also know Dart is somewhat controversial for some people. Personally, I fall on the "competition is good" side of the argument, not the "OMG fragmentation" side. I want ECMAScript Harmony to still happen and give me a cleaner, tighter, more functional JavaScript, but that doesn't mean Dart doesn't have a place in the world as a cleaner OO language for the web. Besides, me thinking otherwise would make me a massive hypocrite as I began working on Python before it was cool (I feel like I need a hipster meme for that statement, but I digress) and I have worked hard to convert people to Python from other languages. Hell, I have tried to foster competition between the Python VMs to get them to push each other to perform better and be ever more interoperable. IOW I don't totally buy this fragmentation argument.
Going into learning Dart I knew who was involved with the language which is what will inherently define how a language feels. I knew Lars Bak of V8 helped design the language, which meant it would have some design restrictions put on it to make it have a damn fast VM. Josh Bloch has been helping to design Dart's library which meant some JDK feel to it. I also know Jim Hugunin is involved which should also help with the VM speed. So fast with an API designed like the JDK.
What did I find? A language with a damn fast VM and a standard library that felt like the JDK. =) Take OO as a Python programmer would expect (e.g. pure OO where everything is an object, not dogmatic OO like Java where everything has to be in an class definition), make types entirely optional for testing and tooling purposes but enough support to use interfaces and generics, and then toss in abilities based on what JavaScript allows and then you have a good idea of what Dart offers.
So, Dart has optional typing. In case you have not heard, Dart does not use type information at runtime for performance and only throws any form of fit if a type doesn't match what is specified unless you run in checked mode. If you do that then you get warnings about possible type issues. But Dart's type system is unsound so don't expect typing to catch every error that a more strict type system might even when you run in checked mode. Dart views types as helpful documentation and a way to help tools assist with things, period. I actually find it rather refreshing to have a language that treats types as just documentation since that is really what they are for the programmer (VMs can use it for performance, but it isn't required for good performance and type safety only saves you from a minor set of bugs which every Python programmer probably realizes eventually =).
But that's even if you bother with types! You can write all of your code without types and everything will run without issue. Even generics are optional, so you can declare a function accepts a List or List
Dart is OO, not prototypical like JavaScript. It's single-inheritance, which I'm fine with. It does have interfaces as one would expect in a statically typed language, but it softens their expense by allowing one to define a default implementation of an interface. What this means is that the Map interface will also give you a HashMap instance if you call new Map(). I suspect they snagged the idea from Scala where you have the Map class which hides HashMap from the user if you simply don't care about what Map implementation you use.
It does have a modicum of privacy by using a leading underscore for signaling something is private, much like Python. But the privacy is enforced at the library-level or is public, period. Every field automatically has a getter and setter defined for them, so there is no way to force a private field (which I think is a good thing since I find private privacy bloody annoying). I also like that getters and setters are directly supported by the language with automatic generation show you don't ever have to see a setSomething()/getSomething() function call just to read/write a field, but you can do something like Python's properties very easily.
The standard libraries are fine and just feel like the JDK. Things are very much LBYL rather than EAFP. I am willing to bet (although I have not tested this) that exceptions are a little expensive in Dart (since exceptions are hard to optimize) and so they would rather go the LBYL way. But they still went a little overboard in my opinion on some things (e.g. the list interface has a last() method instead of supporting negative indexes). But there is nothing there that is making me run away screaming.
One place I do think Dart could use some improvement is simplifying their constructor rules. Upfront Dart has some nice syntactic sugar for a construction where you directly specify how a constructor's arguments map to instance fields, avoiding having to declare the constructor parameters and then also write an assignment. OK, I like that.
Dart also has initializer lists which let you initialize final fields. OK, that's cool and a nice idea taken from C++.
Constructors are not inherited. OK, that's fine since you probably want to be explicit about how you tweak stuff. But there is an exception about the default, no-argument constructor calling the superclass' no-argument constructor. So while not technically inherited, it might as well be in that single instance. And all defined constructors will automatically call the default constructor, which if it isn't defined you must explicitly call a constructor somehow (probably in the initializer list of your constructor). Um, OK...
And you have named constructors. This gets you around from the lack of type-based method overloading for constructors. OK, I can go with that.
You also have constant constructors since fields can only be initialized to compile-constant values. Fine, that's for performance and determinism in instance creation, so I can grasp the desire for that.
And then you have factory constructors. OK, this is where I go "WTF people". This is so that you can have a constructor that actually doesn't create a new instance but instead can return something else other than a new instance (think of Python's __new__() or any of Java's static factory methods). But this lets you use the new keyword on a factory constructor instead of using a static method. And that to me seems unneeded.
So lets recap what constructor options we have. We have regular constructors, default and defined, which supports initialize lists. You have named constructors. There are constant constructors. And you also have factory constructors. If you don't count the default constructor as special that means Dart has four types of constructors. WTF!?! I realize that Java's FactoryFactoryOfFactories crap has probably spooked the crap out of the Dart designers, all the while having Java influences making them think they need the new keyword for anything that would return an instance of a class, but this seems a bit much. Dart's function definitions are rich enough to allow for optional arguments, etc. which would suggest that the typical constructor can do the job of named constructors with static methods picking up the slack where absolutely necessary where factory constructors are used. Maybe I'm missing something here, but I think they tried to design for everything that is bad about Java's constructor mess without stopping to think what their function definitions already buy them, all while making sure the new keyword was used.
Luckily that is the only bit of Dart that I found poorly designed. Everything else is reasonable and something any JavaScript programmer will be somewhat familiar with or quickly grasp.
Now as I said, I only did toy examples in Dart beyond reading the docs from beginning to end. If I had more time this weekend I may have done one more coding example that was more involved, but I ran out of time. But based on what I have read and what I learned, I am happy with Dart and would be content in using it for programming for the Internet. I would also be totally happy being asked to use it in a situation where others wanted to use types (e.g. I would be fine ditching Java for Dart if people really felt the need to hold on to their types).
Subscribe to:
Posts (Atom)