2007-10-25

What to expect in Python 3.0a2

While Python 3.0a2 will most likely not be released by the end of tomorrow as Guido asked, that doesn't mean I can't tell you, dear reader, what the big changes are in the next alpha.

And really there is only one big change that is planned; the new bytes/buffer dichotomy. If you were to download Py3K right now you would notice that the old 'str' type has been renamed str8. It was left there for instances where transitioning over to the new 'str' (which is the old 'unicode', mind you) was either too painful at the time or how to handle the transition in that instance was not really clear.

But at some point between a1 and now, Guido thought about what it would be like if there was an immutable bytes type. Enough people liked the idea that the thought became an a2 goal.

And so that means str8 becomes bytes, bytes becomes buffer, and buffer becomes memoryview. Where does that leave us? Well, memoryview becomes the Python representation of the new buffer protocol as laid out in PEP 3118. The bytes type stays the same, but gets the name buffer as it is basically a buffer of raw bytes.

But now the name 'bytes' is for an immutable array of raw bytes. As discussed in PEP 3137, this means old uses of the old 'str' type can now be translated over to the new bytes type. So string literals get a 'b' tacked in front of them and becomes instances of the new bytes type. The common methods of strings (e.g., capitalize(), etc.) stick around for compatibility and because often times raw bytes from something is just ASCII. This should help with converting to Py3K from 2.x code as the basic semantics are all the same sans some methods. Most people will want to toss in some code that will decode the bytes into a str instance for printing, etc., but that is still optional.

The other big perks is stuff you get thanks to having an immutable type. You can now use raw bytes as keys in dicts as the bytes type is hashable. You can easily use them for comparison without worrying about it mutating from underneath you. It also means if you specify a bytes literal it will stay that way.

But the transition has not entirely happened yet. str8 is mutating into bytes slowly. Probably the biggest thing left is to rename the types and fix any failing tests. The remaining steps are outlined in this thread. As always help is appreciated.