diff options
author | cinap_lenrek <cinap_lenrek@localhost> | 2011-05-03 11:25:13 +0000 |
---|---|---|
committer | cinap_lenrek <cinap_lenrek@localhost> | 2011-05-03 11:25:13 +0000 |
commit | 458120dd40db6b4df55a4e96b650e16798ef06a0 (patch) | |
tree | 8f82685be24fef97e715c6f5ca4c68d34d5074ee /sys/src/cmd/python/Doc/whatsnew | |
parent | 3a742c699f6806c1145aea5149bf15de15a0afd7 (diff) |
add hg and python
Diffstat (limited to 'sys/src/cmd/python/Doc/whatsnew')
-rw-r--r-- | sys/src/cmd/python/Doc/whatsnew/Makefile | 3 | ||||
-rw-r--r-- | sys/src/cmd/python/Doc/whatsnew/whatsnew20.tex | 1337 | ||||
-rw-r--r-- | sys/src/cmd/python/Doc/whatsnew/whatsnew21.tex | 868 | ||||
-rw-r--r-- | sys/src/cmd/python/Doc/whatsnew/whatsnew22.tex | 1466 | ||||
-rw-r--r-- | sys/src/cmd/python/Doc/whatsnew/whatsnew23.tex | 2380 | ||||
-rw-r--r-- | sys/src/cmd/python/Doc/whatsnew/whatsnew24.tex | 1757 | ||||
-rw-r--r-- | sys/src/cmd/python/Doc/whatsnew/whatsnew25.tex | 2530 |
7 files changed, 10341 insertions, 0 deletions
diff --git a/sys/src/cmd/python/Doc/whatsnew/Makefile b/sys/src/cmd/python/Doc/whatsnew/Makefile new file mode 100644 index 000000000..d11f97bf7 --- /dev/null +++ b/sys/src/cmd/python/Doc/whatsnew/Makefile @@ -0,0 +1,3 @@ + +check: + ../../python.exe ../../Tools/scripts/texcheck.py whatsnew25.tex diff --git a/sys/src/cmd/python/Doc/whatsnew/whatsnew20.tex b/sys/src/cmd/python/Doc/whatsnew/whatsnew20.tex new file mode 100644 index 000000000..57e0a369a --- /dev/null +++ b/sys/src/cmd/python/Doc/whatsnew/whatsnew20.tex @@ -0,0 +1,1337 @@ +\documentclass{howto} + +% $Id: whatsnew20.tex 50964 2006-07-30 03:03:43Z fred.drake $ + +\title{What's New in Python 2.0} +\release{1.02} +\author{A.M. Kuchling and Moshe Zadka} +\authoraddress{ + \strong{Python Software Foundation}\\ + Email: \email{amk@amk.ca}, \email{moshez@twistedmatrix.com} +} +\begin{document} +\maketitle\tableofcontents + +\section{Introduction} + +A new release of Python, version 2.0, was released on October 16, 2000. This +article covers the exciting new features in 2.0, highlights some other +useful changes, and points out a few incompatible changes that may require +rewriting code. + +Python's development never completely stops between releases, and a +steady flow of bug fixes and improvements are always being submitted. +A host of minor fixes, a few optimizations, additional docstrings, and +better error messages went into 2.0; to list them all would be +impossible, but they're certainly significant. Consult the +publicly-available CVS logs if you want to see the full list. This +progress is due to the five developers working for +PythonLabs are now getting paid to spend their days fixing bugs, +and also due to the improved communication resulting +from moving to SourceForge. + +% ====================================================================== +\section{What About Python 1.6?} + +Python 1.6 can be thought of as the Contractual Obligations Python +release. After the core development team left CNRI in May 2000, CNRI +requested that a 1.6 release be created, containing all the work on +Python that had been performed at CNRI. Python 1.6 therefore +represents the state of the CVS tree as of May 2000, with the most +significant new feature being Unicode support. Development continued +after May, of course, so the 1.6 tree received a few fixes to ensure +that it's forward-compatible with Python 2.0. 1.6 is therefore part +of Python's evolution, and not a side branch. + +So, should you take much interest in Python 1.6? Probably not. The +1.6final and 2.0beta1 releases were made on the same day (September 5, +2000), the plan being to finalize Python 2.0 within a month or so. If +you have applications to maintain, there seems little point in +breaking things by moving to 1.6, fixing them, and then having another +round of breakage within a month by moving to 2.0; you're better off +just going straight to 2.0. Most of the really interesting features +described in this document are only in 2.0, because a lot of work was +done between May and September. + +% ====================================================================== +\section{New Development Process} + +The most important change in Python 2.0 may not be to the code at all, +but to how Python is developed: in May 2000 the Python developers +began using the tools made available by SourceForge for storing +source code, tracking bug reports, and managing the queue of patch +submissions. To report bugs or submit patches for Python 2.0, use the +bug tracking and patch manager tools available from Python's project +page, located at \url{http://sourceforge.net/projects/python/}. + +The most important of the services now hosted at SourceForge is the +Python CVS tree, the version-controlled repository containing the +source code for Python. Previously, there were roughly 7 or so people +who had write access to the CVS tree, and all patches had to be +inspected and checked in by one of the people on this short list. +Obviously, this wasn't very scalable. By moving the CVS tree to +SourceForge, it became possible to grant write access to more people; +as of September 2000 there were 27 people able to check in changes, a +fourfold increase. This makes possible large-scale changes that +wouldn't be attempted if they'd have to be filtered through the small +group of core developers. For example, one day Peter Schneider-Kamp +took it into his head to drop K\&R C compatibility and convert the C +source for Python to ANSI C. After getting approval on the python-dev +mailing list, he launched into a flurry of checkins that lasted about +a week, other developers joined in to help, and the job was done. If +there were only 5 people with write access, probably that task would +have been viewed as ``nice, but not worth the time and effort needed'' +and it would never have gotten done. + +The shift to using SourceForge's services has resulted in a remarkable +increase in the speed of development. Patches now get submitted, +commented on, revised by people other than the original submitter, and +bounced back and forth between people until the patch is deemed worth +checking in. Bugs are tracked in one central location and can be +assigned to a specific person for fixing, and we can count the number +of open bugs to measure progress. This didn't come without a cost: +developers now have more e-mail to deal with, more mailing lists to +follow, and special tools had to be written for the new environment. +For example, SourceForge sends default patch and bug notification +e-mail messages that are completely unhelpful, so Ka-Ping Yee wrote an +HTML screen-scraper that sends more useful messages. + +The ease of adding code caused a few initial growing pains, such as +code was checked in before it was ready or without getting clear +agreement from the developer group. The approval process that has +emerged is somewhat similar to that used by the Apache group. +Developers can vote +1, +0, -0, or -1 on a patch; +1 and -1 denote +acceptance or rejection, while +0 and -0 mean the developer is mostly +indifferent to the change, though with a slight positive or negative +slant. The most significant change from the Apache model is that the +voting is essentially advisory, letting Guido van Rossum, who has +Benevolent Dictator For Life status, know what the general opinion is. +He can still ignore the result of a vote, and approve or +reject a change even if the community disagrees with him. + +Producing an actual patch is the last step in adding a new feature, +and is usually easy compared to the earlier task of coming up with a +good design. Discussions of new features can often explode into +lengthy mailing list threads, making the discussion hard to follow, +and no one can read every posting to python-dev. Therefore, a +relatively formal process has been set up to write Python Enhancement +Proposals (PEPs), modelled on the Internet RFC process. PEPs are +draft documents that describe a proposed new feature, and are +continually revised until the community reaches a consensus, either +accepting or rejecting the proposal. Quoting from the introduction to +PEP 1, ``PEP Purpose and Guidelines'': + +\begin{quotation} + PEP stands for Python Enhancement Proposal. A PEP is a design + document providing information to the Python community, or + describing a new feature for Python. The PEP should provide a + concise technical specification of the feature and a rationale for + the feature. + + We intend PEPs to be the primary mechanisms for proposing new + features, for collecting community input on an issue, and for + documenting the design decisions that have gone into Python. The + PEP author is responsible for building consensus within the + community and documenting dissenting opinions. +\end{quotation} + +Read the rest of PEP 1 for the details of the PEP editorial process, +style, and format. PEPs are kept in the Python CVS tree on +SourceForge, though they're not part of the Python 2.0 distribution, +and are also available in HTML form from +\url{http://www.python.org/peps/}. As of September 2000, +there are 25 PEPS, ranging from PEP 201, ``Lockstep Iteration'', to +PEP 225, ``Elementwise/Objectwise Operators''. + +% ====================================================================== +\section{Unicode} + +The largest new feature in Python 2.0 is a new fundamental data type: +Unicode strings. Unicode uses 16-bit numbers to represent characters +instead of the 8-bit number used by ASCII, meaning that 65,536 +distinct characters can be supported. + +The final interface for Unicode support was arrived at through +countless often-stormy discussions on the python-dev mailing list, and +mostly implemented by Marc-Andr\'e Lemburg, based on a Unicode string +type implementation by Fredrik Lundh. A detailed explanation of the +interface was written up as \pep{100}, ``Python Unicode Integration''. +This article will simply cover the most significant points about the +Unicode interfaces. + +In Python source code, Unicode strings are written as +\code{u"string"}. Arbitrary Unicode characters can be written using a +new escape sequence, \code{\e u\var{HHHH}}, where \var{HHHH} is a +4-digit hexadecimal number from 0000 to FFFF. The existing +\code{\e x\var{HHHH}} escape sequence can also be used, and octal +escapes can be used for characters up to U+01FF, which is represented +by \code{\e 777}. + +Unicode strings, just like regular strings, are an immutable sequence +type. They can be indexed and sliced, but not modified in place. +Unicode strings have an \method{encode( \optional{encoding} )} method +that returns an 8-bit string in the desired encoding. Encodings are +named by strings, such as \code{'ascii'}, \code{'utf-8'}, +\code{'iso-8859-1'}, or whatever. A codec API is defined for +implementing and registering new encodings that are then available +throughout a Python program. If an encoding isn't specified, the +default encoding is usually 7-bit ASCII, though it can be changed for +your Python installation by calling the +\function{sys.setdefaultencoding(\var{encoding})} function in a +customised version of \file{site.py}. + +Combining 8-bit and Unicode strings always coerces to Unicode, using +the default ASCII encoding; the result of \code{'a' + u'bc'} is +\code{u'abc'}. + +New built-in functions have been added, and existing built-ins +modified to support Unicode: + +\begin{itemize} +\item \code{unichr(\var{ch})} returns a Unicode string 1 character +long, containing the character \var{ch}. + +\item \code{ord(\var{u})}, where \var{u} is a 1-character regular or Unicode string, returns the number of the character as an integer. + +\item \code{unicode(\var{string} \optional{, \var{encoding}} +\optional{, \var{errors}} ) } creates a Unicode string from an 8-bit +string. \code{encoding} is a string naming the encoding to use. +The \code{errors} parameter specifies the treatment of characters that +are invalid for the current encoding; passing \code{'strict'} as the +value causes an exception to be raised on any encoding error, while +\code{'ignore'} causes errors to be silently ignored and +\code{'replace'} uses U+FFFD, the official replacement character, in +case of any problems. + +\item The \keyword{exec} statement, and various built-ins such as +\code{eval()}, \code{getattr()}, and \code{setattr()} will also +accept Unicode strings as well as regular strings. (It's possible +that the process of fixing this missed some built-ins; if you find a +built-in function that accepts strings but doesn't accept Unicode +strings at all, please report it as a bug.) + +\end{itemize} + +A new module, \module{unicodedata}, provides an interface to Unicode +character properties. For example, \code{unicodedata.category(u'A')} +returns the 2-character string 'Lu', the 'L' denoting it's a letter, +and 'u' meaning that it's uppercase. +\code{unicodedata.bidirectional(u'\e u0660')} returns 'AN', meaning that U+0660 is +an Arabic number. + +The \module{codecs} module contains functions to look up existing encodings +and register new ones. Unless you want to implement a +new encoding, you'll most often use the +\function{codecs.lookup(\var{encoding})} function, which returns a +4-element tuple: \code{(\var{encode_func}, +\var{decode_func}, \var{stream_reader}, \var{stream_writer})}. + +\begin{itemize} +\item \var{encode_func} is a function that takes a Unicode string, and +returns a 2-tuple \code{(\var{string}, \var{length})}. \var{string} +is an 8-bit string containing a portion (perhaps all) of the Unicode +string converted into the given encoding, and \var{length} tells you +how much of the Unicode string was converted. + +\item \var{decode_func} is the opposite of \var{encode_func}, taking +an 8-bit string and returning a 2-tuple \code{(\var{ustring}, +\var{length})}, consisting of the resulting Unicode string +\var{ustring} and the integer \var{length} telling how much of the +8-bit string was consumed. + +\item \var{stream_reader} is a class that supports decoding input from +a stream. \var{stream_reader(\var{file_obj})} returns an object that +supports the \method{read()}, \method{readline()}, and +\method{readlines()} methods. These methods will all translate from +the given encoding and return Unicode strings. + +\item \var{stream_writer}, similarly, is a class that supports +encoding output to a stream. \var{stream_writer(\var{file_obj})} +returns an object that supports the \method{write()} and +\method{writelines()} methods. These methods expect Unicode strings, +translating them to the given encoding on output. +\end{itemize} + +For example, the following code writes a Unicode string into a file, +encoding it as UTF-8: + +\begin{verbatim} +import codecs + +unistr = u'\u0660\u2000ab ...' + +(UTF8_encode, UTF8_decode, + UTF8_streamreader, UTF8_streamwriter) = codecs.lookup('UTF-8') + +output = UTF8_streamwriter( open( '/tmp/output', 'wb') ) +output.write( unistr ) +output.close() +\end{verbatim} + +The following code would then read UTF-8 input from the file: + +\begin{verbatim} +input = UTF8_streamreader( open( '/tmp/output', 'rb') ) +print repr(input.read()) +input.close() +\end{verbatim} + +Unicode-aware regular expressions are available through the +\module{re} module, which has a new underlying implementation called +SRE written by Fredrik Lundh of Secret Labs AB. + +A \code{-U} command line option was added which causes the Python +compiler to interpret all string literals as Unicode string literals. +This is intended to be used in testing and future-proofing your Python +code, since some future version of Python may drop support for 8-bit +strings and provide only Unicode strings. + +% ====================================================================== +\section{List Comprehensions} + +Lists are a workhorse data type in Python, and many programs +manipulate a list at some point. Two common operations on lists are +to loop over them, and either pick out the elements that meet a +certain criterion, or apply some function to each element. For +example, given a list of strings, you might want to pull out all the +strings containing a given substring, or strip off trailing whitespace +from each line. + +The existing \function{map()} and \function{filter()} functions can be +used for this purpose, but they require a function as one of their +arguments. This is fine if there's an existing built-in function that +can be passed directly, but if there isn't, you have to create a +little function to do the required work, and Python's scoping rules +make the result ugly if the little function needs additional +information. Take the first example in the previous paragraph, +finding all the strings in the list containing a given substring. You +could write the following to do it: + +\begin{verbatim} +# Given the list L, make a list of all strings +# containing the substring S. +sublist = filter( lambda s, substring=S: + string.find(s, substring) != -1, + L) +\end{verbatim} + +Because of Python's scoping rules, a default argument is used so that +the anonymous function created by the \keyword{lambda} statement knows +what substring is being searched for. List comprehensions make this +cleaner: + +\begin{verbatim} +sublist = [ s for s in L if string.find(s, S) != -1 ] +\end{verbatim} + +List comprehensions have the form: + +\begin{verbatim} +[ expression for expr in sequence1 + for expr2 in sequence2 ... + for exprN in sequenceN + if condition ] +\end{verbatim} + +The \keyword{for}...\keyword{in} clauses contain the sequences to be +iterated over. The sequences do not have to be the same length, +because they are \emph{not} iterated over in parallel, but +from left to right; this is explained more clearly in the following +paragraphs. The elements of the generated list will be the successive +values of \var{expression}. The final \keyword{if} clause is +optional; if present, \var{expression} is only evaluated and added to +the result if \var{condition} is true. + +To make the semantics very clear, a list comprehension is equivalent +to the following Python code: + +\begin{verbatim} +for expr1 in sequence1: + for expr2 in sequence2: + ... + for exprN in sequenceN: + if (condition): + # Append the value of + # the expression to the + # resulting list. +\end{verbatim} + +This means that when there are multiple \keyword{for}...\keyword{in} clauses, +the resulting list will be equal to the product of the lengths of all +the sequences. If you have two lists of length 3, the output list is +9 elements long: + +\begin{verbatim} +seq1 = 'abc' +seq2 = (1,2,3) +>>> [ (x,y) for x in seq1 for y in seq2] +[('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('b', 3), ('c', 1), +('c', 2), ('c', 3)] +\end{verbatim} + +To avoid introducing an ambiguity into Python's grammar, if +\var{expression} is creating a tuple, it must be surrounded with +parentheses. The first list comprehension below is a syntax error, +while the second one is correct: + +\begin{verbatim} +# Syntax error +[ x,y for x in seq1 for y in seq2] +# Correct +[ (x,y) for x in seq1 for y in seq2] +\end{verbatim} + +The idea of list comprehensions originally comes from the functional +programming language Haskell (\url{http://www.haskell.org}). Greg +Ewing argued most effectively for adding them to Python and wrote the +initial list comprehension patch, which was then discussed for a +seemingly endless time on the python-dev mailing list and kept +up-to-date by Skip Montanaro. + +% ====================================================================== +\section{Augmented Assignment} + +Augmented assignment operators, another long-requested feature, have +been added to Python 2.0. Augmented assignment operators include +\code{+=}, \code{-=}, \code{*=}, and so forth. For example, the +statement \code{a += 2} increments the value of the variable +\code{a} by 2, equivalent to the slightly lengthier \code{a = a + 2}. + +% The empty groups below prevent conversion to guillemets. +The full list of supported assignment operators is \code{+=}, +\code{-=}, \code{*=}, \code{/=}, \code{\%=}, \code{**=}, \code{\&=}, +\code{|=}, \verb|^=|, \code{>>=}, and \code{<<=}. Python classes can +override the augmented assignment operators by defining methods named +\method{__iadd__}, \method{__isub__}, etc. For example, the following +\class{Number} class stores a number and supports using += to create a +new instance with an incremented value. + +\begin{verbatim} +class Number: + def __init__(self, value): + self.value = value + def __iadd__(self, increment): + return Number( self.value + increment) + +n = Number(5) +n += 3 +print n.value +\end{verbatim} + +The \method{__iadd__} special method is called with the value of the +increment, and should return a new instance with an appropriately +modified value; this return value is bound as the new value of the +variable on the left-hand side. + +Augmented assignment operators were first introduced in the C +programming language, and most C-derived languages, such as +\program{awk}, \Cpp, Java, Perl, and PHP also support them. The augmented +assignment patch was implemented by Thomas Wouters. + +% ====================================================================== +\section{String Methods} + +Until now string-manipulation functionality was in the \module{string} +module, which was usually a front-end for the \module{strop} +module written in C. The addition of Unicode posed a difficulty for +the \module{strop} module, because the functions would all need to be +rewritten in order to accept either 8-bit or Unicode strings. For +functions such as \function{string.replace()}, which takes 3 string +arguments, that means eight possible permutations, and correspondingly +complicated code. + +Instead, Python 2.0 pushes the problem onto the string type, making +string manipulation functionality available through methods on both +8-bit strings and Unicode strings. + +\begin{verbatim} +>>> 'andrew'.capitalize() +'Andrew' +>>> 'hostname'.replace('os', 'linux') +'hlinuxtname' +>>> 'moshe'.find('sh') +2 +\end{verbatim} + +One thing that hasn't changed, a noteworthy April Fools' joke +notwithstanding, is that Python strings are immutable. Thus, the +string methods return new strings, and do not modify the string on +which they operate. + +The old \module{string} module is still around for backwards +compatibility, but it mostly acts as a front-end to the new string +methods. + +Two methods which have no parallel in pre-2.0 versions, although they +did exist in JPython for quite some time, are \method{startswith()} +and \method{endswith}. \code{s.startswith(t)} is equivalent to \code{s[:len(t)] +== t}, while \code{s.endswith(t)} is equivalent to \code{s[-len(t):] == t}. + +One other method which deserves special mention is \method{join}. The +\method{join} method of a string receives one parameter, a sequence of +strings, and is equivalent to the \function{string.join} function from +the old \module{string} module, with the arguments reversed. In other +words, \code{s.join(seq)} is equivalent to the old +\code{string.join(seq, s)}. + +% ====================================================================== +\section{Garbage Collection of Cycles} + +The C implementation of Python uses reference counting to implement +garbage collection. Every Python object maintains a count of the +number of references pointing to itself, and adjusts the count as +references are created or destroyed. Once the reference count reaches +zero, the object is no longer accessible, since you need to have a +reference to an object to access it, and if the count is zero, no +references exist any longer. + +Reference counting has some pleasant properties: it's easy to +understand and implement, and the resulting implementation is +portable, fairly fast, and reacts well with other libraries that +implement their own memory handling schemes. The major problem with +reference counting is that it sometimes doesn't realise that objects +are no longer accessible, resulting in a memory leak. This happens +when there are cycles of references. + +Consider the simplest possible cycle, +a class instance which has a reference to itself: + +\begin{verbatim} +instance = SomeClass() +instance.myself = instance +\end{verbatim} + +After the above two lines of code have been executed, the reference +count of \code{instance} is 2; one reference is from the variable +named \samp{'instance'}, and the other is from the \samp{myself} +attribute of the instance. + +If the next line of code is \code{del instance}, what happens? The +reference count of \code{instance} is decreased by 1, so it has a +reference count of 1; the reference in the \samp{myself} attribute +still exists. Yet the instance is no longer accessible through Python +code, and it could be deleted. Several objects can participate in a +cycle if they have references to each other, causing all of the +objects to be leaked. + +Python 2.0 fixes this problem by periodically executing a cycle +detection algorithm which looks for inaccessible cycles and deletes +the objects involved. A new \module{gc} module provides functions to +perform a garbage collection, obtain debugging statistics, and tuning +the collector's parameters. + +Running the cycle detection algorithm takes some time, and therefore +will result in some additional overhead. It is hoped that after we've +gotten experience with the cycle collection from using 2.0, Python 2.1 +will be able to minimize the overhead with careful tuning. It's not +yet obvious how much performance is lost, because benchmarking this is +tricky and depends crucially on how often the program creates and +destroys objects. The detection of cycles can be disabled when Python +is compiled, if you can't afford even a tiny speed penalty or suspect +that the cycle collection is buggy, by specifying the +\longprogramopt{without-cycle-gc} switch when running the +\program{configure} script. + +Several people tackled this problem and contributed to a solution. An +early implementation of the cycle detection approach was written by +Toby Kelsey. The current algorithm was suggested by Eric Tiedemann +during a visit to CNRI, and Guido van Rossum and Neil Schemenauer +wrote two different implementations, which were later integrated by +Neil. Lots of other people offered suggestions along the way; the +March 2000 archives of the python-dev mailing list contain most of the +relevant discussion, especially in the threads titled ``Reference +cycle collection for Python'' and ``Finalization again''. + +% ====================================================================== +\section{Other Core Changes} + +Various minor changes have been made to Python's syntax and built-in +functions. None of the changes are very far-reaching, but they're +handy conveniences. + +\subsection{Minor Language Changes} + +A new syntax makes it more convenient to call a given function +with a tuple of arguments and/or a dictionary of keyword arguments. +In Python 1.5 and earlier, you'd use the \function{apply()} +built-in function: \code{apply(f, \var{args}, \var{kw})} calls the +function \function{f()} with the argument tuple \var{args} and the +keyword arguments in the dictionary \var{kw}. \function{apply()} +is the same in 2.0, but thanks to a patch from +Greg Ewing, \code{f(*\var{args}, **\var{kw})} as a shorter +and clearer way to achieve the same effect. This syntax is +symmetrical with the syntax for defining functions: + +\begin{verbatim} +def f(*args, **kw): + # args is a tuple of positional args, + # kw is a dictionary of keyword args + ... +\end{verbatim} + +The \keyword{print} statement can now have its output directed to a +file-like object by following the \keyword{print} with +\verb|>> file|, similar to the redirection operator in \UNIX{} shells. +Previously you'd either have to use the \method{write()} method of the +file-like object, which lacks the convenience and simplicity of +\keyword{print}, or you could assign a new value to +\code{sys.stdout} and then restore the old value. For sending output to standard error, +it's much easier to write this: + +\begin{verbatim} +print >> sys.stderr, "Warning: action field not supplied" +\end{verbatim} + +Modules can now be renamed on importing them, using the syntax +\code{import \var{module} as \var{name}} or \code{from \var{module} +import \var{name} as \var{othername}}. The patch was submitted by +Thomas Wouters. + +A new format style is available when using the \code{\%} operator; +'\%r' will insert the \function{repr()} of its argument. This was +also added from symmetry considerations, this time for symmetry with +the existing '\%s' format style, which inserts the \function{str()} of +its argument. For example, \code{'\%r \%s' \% ('abc', 'abc')} returns a +string containing \verb|'abc' abc|. + +Previously there was no way to implement a class that overrode +Python's built-in \keyword{in} operator and implemented a custom +version. \code{\var{obj} in \var{seq}} returns true if \var{obj} is +present in the sequence \var{seq}; Python computes this by simply +trying every index of the sequence until either \var{obj} is found or +an \exception{IndexError} is encountered. Moshe Zadka contributed a +patch which adds a \method{__contains__} magic method for providing a +custom implementation for \keyword{in}. Additionally, new built-in +objects written in C can define what \keyword{in} means for them via a +new slot in the sequence protocol. + +Earlier versions of Python used a recursive algorithm for deleting +objects. Deeply nested data structures could cause the interpreter to +fill up the C stack and crash; Christian Tismer rewrote the deletion +logic to fix this problem. On a related note, comparing recursive +objects recursed infinitely and crashed; Jeremy Hylton rewrote the +code to no longer crash, producing a useful result instead. For +example, after this code: + +\begin{verbatim} +a = [] +b = [] +a.append(a) +b.append(b) +\end{verbatim} + +The comparison \code{a==b} returns true, because the two recursive +data structures are isomorphic. See the thread ``trashcan +and PR\#7'' in the April 2000 archives of the python-dev mailing list +for the discussion leading up to this implementation, and some useful +relevant links. +% Starting URL: +% http://www.python.org/pipermail/python-dev/2000-April/004834.html + +Note that comparisons can now also raise exceptions. In earlier +versions of Python, a comparison operation such as \code{cmp(a,b)} +would always produce an answer, even if a user-defined +\method{__cmp__} method encountered an error, since the resulting +exception would simply be silently swallowed. + +Work has been done on porting Python to 64-bit Windows on the Itanium +processor, mostly by Trent Mick of ActiveState. (Confusingly, +\code{sys.platform} is still \code{'win32'} on Win64 because it seems +that for ease of porting, MS Visual \Cpp{} treats code as 32 bit on Itanium.) +PythonWin also supports Windows CE; see the Python CE page at +\url{http://starship.python.net/crew/mhammond/ce/} for more +information. + +Another new platform is Darwin/MacOS X; initial support for it is in +Python 2.0. Dynamic loading works, if you specify ``configure +--with-dyld --with-suffix=.x''. Consult the README in the Python +source distribution for more instructions. + +An attempt has been made to alleviate one of Python's warts, the +often-confusing \exception{NameError} exception when code refers to a +local variable before the variable has been assigned a value. For +example, the following code raises an exception on the \keyword{print} +statement in both 1.5.2 and 2.0; in 1.5.2 a \exception{NameError} +exception is raised, while 2.0 raises a new +\exception{UnboundLocalError} exception. +\exception{UnboundLocalError} is a subclass of \exception{NameError}, +so any existing code that expects \exception{NameError} to be raised +should still work. + +\begin{verbatim} +def f(): + print "i=",i + i = i + 1 +f() +\end{verbatim} + +Two new exceptions, \exception{TabError} and +\exception{IndentationError}, have been introduced. They're both +subclasses of \exception{SyntaxError}, and are raised when Python code +is found to be improperly indented. + +\subsection{Changes to Built-in Functions} + +A new built-in, \function{zip(\var{seq1}, \var{seq2}, ...)}, has been +added. \function{zip()} returns a list of tuples where each tuple +contains the i-th element from each of the argument sequences. The +difference between \function{zip()} and \code{map(None, \var{seq1}, +\var{seq2})} is that \function{map()} pads the sequences with +\code{None} if the sequences aren't all of the same length, while +\function{zip()} truncates the returned list to the length of the +shortest argument sequence. + +The \function{int()} and \function{long()} functions now accept an +optional ``base'' parameter when the first argument is a string. +\code{int('123', 10)} returns 123, while \code{int('123', 16)} returns +291. \code{int(123, 16)} raises a \exception{TypeError} exception +with the message ``can't convert non-string with explicit base''. + +A new variable holding more detailed version information has been +added to the \module{sys} module. \code{sys.version_info} is a tuple +\code{(\var{major}, \var{minor}, \var{micro}, \var{level}, +\var{serial})} For example, in a hypothetical 2.0.1beta1, +\code{sys.version_info} would be \code{(2, 0, 1, 'beta', 1)}. +\var{level} is a string such as \code{"alpha"}, \code{"beta"}, or +\code{"final"} for a final release. + +Dictionaries have an odd new method, \method{setdefault(\var{key}, +\var{default})}, which behaves similarly to the existing +\method{get()} method. However, if the key is missing, +\method{setdefault()} both returns the value of \var{default} as +\method{get()} would do, and also inserts it into the dictionary as +the value for \var{key}. Thus, the following lines of code: + +\begin{verbatim} +if dict.has_key( key ): return dict[key] +else: + dict[key] = [] + return dict[key] +\end{verbatim} + +can be reduced to a single \code{return dict.setdefault(key, [])} statement. + +The interpreter sets a maximum recursion depth in order to catch +runaway recursion before filling the C stack and causing a core dump +or GPF.. Previously this limit was fixed when you compiled Python, +but in 2.0 the maximum recursion depth can be read and modified using +\function{sys.getrecursionlimit} and \function{sys.setrecursionlimit}. +The default value is 1000, and a rough maximum value for a given +platform can be found by running a new script, +\file{Misc/find_recursionlimit.py}. + +% ====================================================================== +\section{Porting to 2.0} + +New Python releases try hard to be compatible with previous releases, +and the record has been pretty good. However, some changes are +considered useful enough, usually because they fix initial design decisions that +turned out to be actively mistaken, that breaking backward compatibility +can't always be avoided. This section lists the changes in Python 2.0 +that may cause old Python code to break. + +The change which will probably break the most code is tightening up +the arguments accepted by some methods. Some methods would take +multiple arguments and treat them as a tuple, particularly various +list methods such as \method{.append()} and \method{.insert()}. +In earlier versions of Python, if \code{L} is a list, \code{L.append( +1,2 )} appends the tuple \code{(1,2)} to the list. In Python 2.0 this +causes a \exception{TypeError} exception to be raised, with the +message: 'append requires exactly 1 argument; 2 given'. The fix is to +simply add an extra set of parentheses to pass both values as a tuple: +\code{L.append( (1,2) )}. + +The earlier versions of these methods were more forgiving because they +used an old function in Python's C interface to parse their arguments; +2.0 modernizes them to use \function{PyArg_ParseTuple}, the current +argument parsing function, which provides more helpful error messages +and treats multi-argument calls as errors. If you absolutely must use +2.0 but can't fix your code, you can edit \file{Objects/listobject.c} +and define the preprocessor symbol \code{NO_STRICT_LIST_APPEND} to +preserve the old behaviour; this isn't recommended. + +Some of the functions in the \module{socket} module are still +forgiving in this way. For example, \function{socket.connect( +('hostname', 25) )} is the correct form, passing a tuple representing +an IP address, but \function{socket.connect( 'hostname', 25 )} also +works. \function{socket.connect_ex()} and \function{socket.bind()} are +similarly easy-going. 2.0alpha1 tightened these functions up, but +because the documentation actually used the erroneous multiple +argument form, many people wrote code which would break with the +stricter checking. GvR backed out the changes in the face of public +reaction, so for the \module{socket} module, the documentation was +fixed and the multiple argument form is simply marked as deprecated; +it \emph{will} be tightened up again in a future Python version. + +The \code{\e x} escape in string literals now takes exactly 2 hex +digits. Previously it would consume all the hex digits following the +'x' and take the lowest 8 bits of the result, so \code{\e x123456} was +equivalent to \code{\e x56}. + +The \exception{AttributeError} and \exception{NameError} exceptions +have a more friendly error message, whose text will be something like +\code{'Spam' instance has no attribute 'eggs'} or \code{name 'eggs' is +not defined}. Previously the error message was just the missing +attribute name \code{eggs}, and code written to take advantage of this +fact will break in 2.0. + +Some work has been done to make integers and long integers a bit more +interchangeable. In 1.5.2, large-file support was added for Solaris, +to allow reading files larger than 2~GiB; this made the \method{tell()} +method of file objects return a long integer instead of a regular +integer. Some code would subtract two file offsets and attempt to use +the result to multiply a sequence or slice a string, but this raised a +\exception{TypeError}. In 2.0, long integers can be used to multiply +or slice a sequence, and it'll behave as you'd intuitively expect it +to; \code{3L * 'abc'} produces 'abcabcabc', and \code{ +(0,1,2,3)[2L:4L]} produces (2,3). Long integers can also be used in +various contexts where previously only integers were accepted, such +as in the \method{seek()} method of file objects, and in the formats +supported by the \verb|%| operator (\verb|%d|, \verb|%i|, \verb|%x|, +etc.). For example, \code{"\%d" \% 2L**64} will produce the string +\samp{18446744073709551616}. + +The subtlest long integer change of all is that the \function{str()} +of a long integer no longer has a trailing 'L' character, though +\function{repr()} still includes it. The 'L' annoyed many people who +wanted to print long integers that looked just like regular integers, +since they had to go out of their way to chop off the character. This +is no longer a problem in 2.0, but code which does \code{str(longval)[:-1]} and assumes the 'L' is there, will now lose +the final digit. + +Taking the \function{repr()} of a float now uses a different +formatting precision than \function{str()}. \function{repr()} uses +\code{\%.17g} format string for C's \function{sprintf()}, while +\function{str()} uses \code{\%.12g} as before. The effect is that +\function{repr()} may occasionally show more decimal places than +\function{str()}, for certain numbers. +For example, the number 8.1 can't be represented exactly in binary, so +\code{repr(8.1)} is \code{'8.0999999999999996'}, while str(8.1) is +\code{'8.1'}. + +The \code{-X} command-line option, which turned all standard +exceptions into strings instead of classes, has been removed; the +standard exceptions will now always be classes. The +\module{exceptions} module containing the standard exceptions was +translated from Python to a built-in C module, written by Barry Warsaw +and Fredrik Lundh. + +% Commented out for now -- I don't think anyone will care. +%The pattern and match objects provided by SRE are C types, not Python +%class instances as in 1.5. This means you can no longer inherit from +%\class{RegexObject} or \class{MatchObject}, but that shouldn't be much +%of a problem since no one should have been doing that in the first +%place. + +% ====================================================================== +\section{Extending/Embedding Changes} + +Some of the changes are under the covers, and will only be apparent to +people writing C extension modules or embedding a Python interpreter +in a larger application. If you aren't dealing with Python's C API, +you can safely skip this section. + +The version number of the Python C API was incremented, so C +extensions compiled for 1.5.2 must be recompiled in order to work with +2.0. On Windows, it's not possible for Python 2.0 to import a third +party extension built for Python 1.5.x due to how Windows DLLs work, +so Python will raise an exception and the import will fail. + +Users of Jim Fulton's ExtensionClass module will be pleased to find +out that hooks have been added so that ExtensionClasses are now +supported by \function{isinstance()} and \function{issubclass()}. +This means you no longer have to remember to write code such as +\code{if type(obj) == myExtensionClass}, but can use the more natural +\code{if isinstance(obj, myExtensionClass)}. + +The \file{Python/importdl.c} file, which was a mass of \#ifdefs to +support dynamic loading on many different platforms, was cleaned up +and reorganised by Greg Stein. \file{importdl.c} is now quite small, +and platform-specific code has been moved into a bunch of +\file{Python/dynload_*.c} files. Another cleanup: there were also a +number of \file{my*.h} files in the Include/ directory that held +various portability hacks; they've been merged into a single file, +\file{Include/pyport.h}. + +Vladimir Marangozov's long-awaited malloc restructuring was completed, +to make it easy to have the Python interpreter use a custom allocator +instead of C's standard \function{malloc()}. For documentation, read +the comments in \file{Include/pymem.h} and +\file{Include/objimpl.h}. For the lengthy discussions during which +the interface was hammered out, see the Web archives of the 'patches' +and 'python-dev' lists at python.org. + +Recent versions of the GUSI development environment for MacOS support +POSIX threads. Therefore, Python's POSIX threading support now works +on the Macintosh. Threading support using the user-space GNU \texttt{pth} +library was also contributed. + +Threading support on Windows was enhanced, too. Windows supports +thread locks that use kernel objects only in case of contention; in +the common case when there's no contention, they use simpler functions +which are an order of magnitude faster. A threaded version of Python +1.5.2 on NT is twice as slow as an unthreaded version; with the 2.0 +changes, the difference is only 10\%. These improvements were +contributed by Yakov Markovitch. + +Python 2.0's source now uses only ANSI C prototypes, so compiling Python now +requires an ANSI C compiler, and can no longer be done using a compiler that +only supports K\&R C. + +Previously the Python virtual machine used 16-bit numbers in its +bytecode, limiting the size of source files. In particular, this +affected the maximum size of literal lists and dictionaries in Python +source; occasionally people who are generating Python code would run +into this limit. A patch by Charles G. Waldman raises the limit from +\verb|2^16| to \verb|2^{32}|. + +Three new convenience functions intended for adding constants to a +module's dictionary at module initialization time were added: +\function{PyModule_AddObject()}, \function{PyModule_AddIntConstant()}, +and \function{PyModule_AddStringConstant()}. Each of these functions +takes a module object, a null-terminated C string containing the name +to be added, and a third argument for the value to be assigned to the +name. This third argument is, respectively, a Python object, a C +long, or a C string. + +A wrapper API was added for \UNIX-style signal handlers. +\function{PyOS_getsig()} gets a signal handler and +\function{PyOS_setsig()} will set a new handler. + +% ====================================================================== +\section{Distutils: Making Modules Easy to Install} + +Before Python 2.0, installing modules was a tedious affair -- there +was no way to figure out automatically where Python is installed, or +what compiler options to use for extension modules. Software authors +had to go through an arduous ritual of editing Makefiles and +configuration files, which only really work on \UNIX{} and leave Windows +and MacOS unsupported. Python users faced wildly differing +installation instructions which varied between different extension +packages, which made administering a Python installation something of +a chore. + +The SIG for distribution utilities, shepherded by Greg Ward, has +created the Distutils, a system to make package installation much +easier. They form the \module{distutils} package, a new part of +Python's standard library. In the best case, installing a Python +module from source will require the same steps: first you simply mean +unpack the tarball or zip archive, and the run ``\code{python setup.py +install}''. The platform will be automatically detected, the compiler +will be recognized, C extension modules will be compiled, and the +distribution installed into the proper directory. Optional +command-line arguments provide more control over the installation +process, the distutils package offers many places to override defaults +-- separating the build from the install, building or installing in +non-default directories, and more. + +In order to use the Distutils, you need to write a \file{setup.py} +script. For the simple case, when the software contains only .py +files, a minimal \file{setup.py} can be just a few lines long: + +\begin{verbatim} +from distutils.core import setup +setup (name = "foo", version = "1.0", + py_modules = ["module1", "module2"]) +\end{verbatim} + +The \file{setup.py} file isn't much more complicated if the software +consists of a few packages: + +\begin{verbatim} +from distutils.core import setup +setup (name = "foo", version = "1.0", + packages = ["package", "package.subpackage"]) +\end{verbatim} + +A C extension can be the most complicated case; here's an example taken from +the PyXML package: + + +\begin{verbatim} +from distutils.core import setup, Extension + +expat_extension = Extension('xml.parsers.pyexpat', + define_macros = [('XML_NS', None)], + include_dirs = [ 'extensions/expat/xmltok', + 'extensions/expat/xmlparse' ], + sources = [ 'extensions/pyexpat.c', + 'extensions/expat/xmltok/xmltok.c', + 'extensions/expat/xmltok/xmlrole.c', + ] + ) +setup (name = "PyXML", version = "0.5.4", + ext_modules =[ expat_extension ] ) +\end{verbatim} + +The Distutils can also take care of creating source and binary +distributions. The ``sdist'' command, run by ``\code{python setup.py +sdist}', builds a source distribution such as \file{foo-1.0.tar.gz}. +Adding new commands isn't difficult, ``bdist_rpm'' and +``bdist_wininst'' commands have already been contributed to create an +RPM distribution and a Windows installer for the software, +respectively. Commands to create other distribution formats such as +Debian packages and Solaris \file{.pkg} files are in various stages of +development. + +All this is documented in a new manual, \textit{Distributing Python +Modules}, that joins the basic set of Python documentation. + +% ====================================================================== +\section{XML Modules} + +Python 1.5.2 included a simple XML parser in the form of the +\module{xmllib} module, contributed by Sjoerd Mullender. Since +1.5.2's release, two different interfaces for processing XML have +become common: SAX2 (version 2 of the Simple API for XML) provides an +event-driven interface with some similarities to \module{xmllib}, and +the DOM (Document Object Model) provides a tree-based interface, +transforming an XML document into a tree of nodes that can be +traversed and modified. Python 2.0 includes a SAX2 interface and a +stripped-down DOM interface as part of the \module{xml} package. +Here we will give a brief overview of these new interfaces; consult +the Python documentation or the source code for complete details. +The Python XML SIG is also working on improved documentation. + +\subsection{SAX2 Support} + +SAX defines an event-driven interface for parsing XML. To use SAX, +you must write a SAX handler class. Handler classes inherit from +various classes provided by SAX, and override various methods that +will then be called by the XML parser. For example, the +\method{startElement} and \method{endElement} methods are called for +every starting and end tag encountered by the parser, the +\method{characters()} method is called for every chunk of character +data, and so forth. + +The advantage of the event-driven approach is that the whole +document doesn't have to be resident in memory at any one time, which +matters if you are processing really huge documents. However, writing +the SAX handler class can get very complicated if you're trying to +modify the document structure in some elaborate way. + +For example, this little example program defines a handler that prints +a message for every starting and ending tag, and then parses the file +\file{hamlet.xml} using it: + +\begin{verbatim} +from xml import sax + +class SimpleHandler(sax.ContentHandler): + def startElement(self, name, attrs): + print 'Start of element:', name, attrs.keys() + + def endElement(self, name): + print 'End of element:', name + +# Create a parser object +parser = sax.make_parser() + +# Tell it what handler to use +handler = SimpleHandler() +parser.setContentHandler( handler ) + +# Parse a file! +parser.parse( 'hamlet.xml' ) +\end{verbatim} + +For more information, consult the Python documentation, or the XML +HOWTO at \url{http://pyxml.sourceforge.net/topics/howto/xml-howto.html}. + +\subsection{DOM Support} + +The Document Object Model is a tree-based representation for an XML +document. A top-level \class{Document} instance is the root of the +tree, and has a single child which is the top-level \class{Element} +instance. This \class{Element} has children nodes representing +character data and any sub-elements, which may have further children +of their own, and so forth. Using the DOM you can traverse the +resulting tree any way you like, access element and attribute values, +insert and delete nodes, and convert the tree back into XML. + +The DOM is useful for modifying XML documents, because you can create +a DOM tree, modify it by adding new nodes or rearranging subtrees, and +then produce a new XML document as output. You can also construct a +DOM tree manually and convert it to XML, which can be a more flexible +way of producing XML output than simply writing +\code{<tag1>}...\code{</tag1>} to a file. + +The DOM implementation included with Python lives in the +\module{xml.dom.minidom} module. It's a lightweight implementation of +the Level 1 DOM with support for XML namespaces. The +\function{parse()} and \function{parseString()} convenience +functions are provided for generating a DOM tree: + +\begin{verbatim} +from xml.dom import minidom +doc = minidom.parse('hamlet.xml') +\end{verbatim} + +\code{doc} is a \class{Document} instance. \class{Document}, like all +the other DOM classes such as \class{Element} and \class{Text}, is a +subclass of the \class{Node} base class. All the nodes in a DOM tree +therefore support certain common methods, such as \method{toxml()} +which returns a string containing the XML representation of the node +and its children. Each class also has special methods of its own; for +example, \class{Element} and \class{Document} instances have a method +to find all child elements with a given tag name. Continuing from the +previous 2-line example: + +\begin{verbatim} +perslist = doc.getElementsByTagName( 'PERSONA' ) +print perslist[0].toxml() +print perslist[1].toxml() +\end{verbatim} + +For the \textit{Hamlet} XML file, the above few lines output: + +\begin{verbatim} +<PERSONA>CLAUDIUS, king of Denmark. </PERSONA> +<PERSONA>HAMLET, son to the late, and nephew to the present king.</PERSONA> +\end{verbatim} + +The root element of the document is available as +\code{doc.documentElement}, and its children can be easily modified +by deleting, adding, or removing nodes: + +\begin{verbatim} +root = doc.documentElement + +# Remove the first child +root.removeChild( root.childNodes[0] ) + +# Move the new first child to the end +root.appendChild( root.childNodes[0] ) + +# Insert the new first child (originally, +# the third child) before the 20th child. +root.insertBefore( root.childNodes[0], root.childNodes[20] ) +\end{verbatim} + +Again, I will refer you to the Python documentation for a complete +listing of the different \class{Node} classes and their various methods. + +\subsection{Relationship to PyXML} + +The XML Special Interest Group has been working on XML-related Python +code for a while. Its code distribution, called PyXML, is available +from the SIG's Web pages at \url{http://www.python.org/sigs/xml-sig/}. +The PyXML distribution also used the package name \samp{xml}. If +you've written programs that used PyXML, you're probably wondering +about its compatibility with the 2.0 \module{xml} package. + +The answer is that Python 2.0's \module{xml} package isn't compatible +with PyXML, but can be made compatible by installing a recent version +PyXML. Many applications can get by with the XML support that is +included with Python 2.0, but more complicated applications will +require that the full PyXML package will be installed. When +installed, PyXML versions 0.6.0 or greater will replace the +\module{xml} package shipped with Python, and will be a strict +superset of the standard package, adding a bunch of additional +features. Some of the additional features in PyXML include: + +\begin{itemize} +\item 4DOM, a full DOM implementation +from FourThought, Inc. +\item The xmlproc validating parser, written by Lars Marius Garshol. +\item The \module{sgmlop} parser accelerator module, written by Fredrik Lundh. +\end{itemize} + +% ====================================================================== +\section{Module changes} + +Lots of improvements and bugfixes were made to Python's extensive +standard library; some of the affected modules include +\module{readline}, \module{ConfigParser}, \module{cgi}, +\module{calendar}, \module{posix}, \module{readline}, \module{xmllib}, +\module{aifc}, \module{chunk, wave}, \module{random}, \module{shelve}, +and \module{nntplib}. Consult the CVS logs for the exact +patch-by-patch details. + +Brian Gallew contributed OpenSSL support for the \module{socket} +module. OpenSSL is an implementation of the Secure Socket Layer, +which encrypts the data being sent over a socket. When compiling +Python, you can edit \file{Modules/Setup} to include SSL support, +which adds an additional function to the \module{socket} module: +\function{socket.ssl(\var{socket}, \var{keyfile}, \var{certfile})}, +which takes a socket object and returns an SSL socket. The +\module{httplib} and \module{urllib} modules were also changed to +support ``https://'' URLs, though no one has implemented FTP or SMTP +over SSL. + +The \module{httplib} module has been rewritten by Greg Stein to +support HTTP/1.1. Backward compatibility with the 1.5 version of +\module{httplib} is provided, though using HTTP/1.1 features such as +pipelining will require rewriting code to use a different set of +interfaces. + +The \module{Tkinter} module now supports Tcl/Tk version 8.1, 8.2, or +8.3, and support for the older 7.x versions has been dropped. The +Tkinter module now supports displaying Unicode strings in Tk widgets. +Also, Fredrik Lundh contributed an optimization which makes operations +like \code{create_line} and \code{create_polygon} much faster, +especially when using lots of coordinates. + +The \module{curses} module has been greatly extended, starting from +Oliver Andrich's enhanced version, to provide many additional +functions from ncurses and SYSV curses, such as colour, alternative +character set support, pads, and mouse support. This means the module +is no longer compatible with operating systems that only have BSD +curses, but there don't seem to be any currently maintained OSes that +fall into this category. + +As mentioned in the earlier discussion of 2.0's Unicode support, the +underlying implementation of the regular expressions provided by the +\module{re} module has been changed. SRE, a new regular expression +engine written by Fredrik Lundh and partially funded by Hewlett +Packard, supports matching against both 8-bit strings and Unicode +strings. + +% ====================================================================== +\section{New modules} + +A number of new modules were added. We'll simply list them with brief +descriptions; consult the 2.0 documentation for the details of a +particular module. + +\begin{itemize} + +\item{\module{atexit}}: +For registering functions to be called before the Python interpreter exits. +Code that currently sets +\code{sys.exitfunc} directly should be changed to +use the \module{atexit} module instead, importing \module{atexit} +and calling \function{atexit.register()} with +the function to be called on exit. +(Contributed by Skip Montanaro.) + +\item{\module{codecs}, \module{encodings}, \module{unicodedata}:} Added as part of the new Unicode support. + +\item{\module{filecmp}:} Supersedes the old \module{cmp}, \module{cmpcache} and +\module{dircmp} modules, which have now become deprecated. +(Contributed by Gordon MacMillan and Moshe Zadka.) + +\item{\module{gettext}:} This module provides internationalization +(I18N) and localization (L10N) support for Python programs by +providing an interface to the GNU gettext message catalog library. +(Integrated by Barry Warsaw, from separate contributions by Martin +von~L\"owis, Peter Funk, and James Henstridge.) + +\item{\module{linuxaudiodev}:} Support for the \file{/dev/audio} +device on Linux, a twin to the existing \module{sunaudiodev} module. +(Contributed by Peter Bosch, with fixes by Jeremy Hylton.) + +\item{\module{mmap}:} An interface to memory-mapped files on both +Windows and \UNIX. A file's contents can be mapped directly into +memory, at which point it behaves like a mutable string, so its +contents can be read and modified. They can even be passed to +functions that expect ordinary strings, such as the \module{re} +module. (Contributed by Sam Rushing, with some extensions by +A.M. Kuchling.) + +\item{\module{pyexpat}:} An interface to the Expat XML parser. +(Contributed by Paul Prescod.) + +\item{\module{robotparser}:} Parse a \file{robots.txt} file, which is +used for writing Web spiders that politely avoid certain areas of a +Web site. The parser accepts the contents of a \file{robots.txt} file, +builds a set of rules from it, and can then answer questions about +the fetchability of a given URL. (Contributed by Skip Montanaro.) + +\item{\module{tabnanny}:} A module/script to +check Python source code for ambiguous indentation. +(Contributed by Tim Peters.) + +\item{\module{UserString}:} A base class useful for deriving objects that behave like strings. + +\item{\module{webbrowser}:} A module that provides a platform independent +way to launch a web browser on a specific URL. For each platform, various +browsers are tried in a specific order. The user can alter which browser +is launched by setting the \var{BROWSER} environment variable. +(Originally inspired by Eric S. Raymond's patch to \module{urllib} +which added similar functionality, but +the final module comes from code originally +implemented by Fred Drake as \file{Tools/idle/BrowserControl.py}, +and adapted for the standard library by Fred.) + +\item{\module{_winreg}:} An interface to the +Windows registry. \module{_winreg} is an adaptation of functions that +have been part of PythonWin since 1995, but has now been added to the core +distribution, and enhanced to support Unicode. +\module{_winreg} was written by Bill Tutt and Mark Hammond. + +\item{\module{zipfile}:} A module for reading and writing ZIP-format +archives. These are archives produced by \program{PKZIP} on +DOS/Windows or \program{zip} on \UNIX, not to be confused with +\program{gzip}-format files (which are supported by the \module{gzip} +module) +(Contributed by James C. Ahlstrom.) + +\item{\module{imputil}:} A module that provides a simpler way for +writing customised import hooks, in comparison to the existing +\module{ihooks} module. (Implemented by Greg Stein, with much +discussion on python-dev along the way.) + +\end{itemize} + +% ====================================================================== +\section{IDLE Improvements} + +IDLE is the official Python cross-platform IDE, written using Tkinter. +Python 2.0 includes IDLE 0.6, which adds a number of new features and +improvements. A partial list: + +\begin{itemize} +\item UI improvements and optimizations, +especially in the area of syntax highlighting and auto-indentation. + +\item The class browser now shows more information, such as the top +level functions in a module. + +\item Tab width is now a user settable option. When opening an existing Python +file, IDLE automatically detects the indentation conventions, and adapts. + +\item There is now support for calling browsers on various platforms, +used to open the Python documentation in a browser. + +\item IDLE now has a command line, which is largely similar to +the vanilla Python interpreter. + +\item Call tips were added in many places. + +\item IDLE can now be installed as a package. + +\item In the editor window, there is now a line/column bar at the bottom. + +\item Three new keystroke commands: Check module (Alt-F5), Import +module (F5) and Run script (Ctrl-F5). + +\end{itemize} + +% ====================================================================== +\section{Deleted and Deprecated Modules} + +A few modules have been dropped because they're obsolete, or because +there are now better ways to do the same thing. The \module{stdwin} +module is gone; it was for a platform-independent windowing toolkit +that's no longer developed. + +A number of modules have been moved to the +\file{lib-old} subdirectory: +\module{cmp}, \module{cmpcache}, \module{dircmp}, \module{dump}, +\module{find}, \module{grep}, \module{packmail}, +\module{poly}, \module{util}, \module{whatsound}, \module{zmod}. +If you have code which relies on a module that's been moved to +\file{lib-old}, you can simply add that directory to \code{sys.path} +to get them back, but you're encouraged to update any code that uses +these modules. + +\section{Acknowledgements} + +The authors would like to thank the following people for offering +suggestions on various drafts of this article: David Bolen, Mark +Hammond, Gregg Hauser, Jeremy Hylton, Fredrik Lundh, Detlef Lannert, +Aahz Maruch, Skip Montanaro, Vladimir Marangozov, Tobias Polzin, Guido +van Rossum, Neil Schemenauer, and Russ Schmidt. + +\end{document} diff --git a/sys/src/cmd/python/Doc/whatsnew/whatsnew21.tex b/sys/src/cmd/python/Doc/whatsnew/whatsnew21.tex new file mode 100644 index 000000000..53a179bfb --- /dev/null +++ b/sys/src/cmd/python/Doc/whatsnew/whatsnew21.tex @@ -0,0 +1,868 @@ +\documentclass{howto} + +\usepackage{distutils} + +% $Id: whatsnew21.tex 50964 2006-07-30 03:03:43Z fred.drake $ + +\title{What's New in Python 2.1} +\release{1.01} +\author{A.M. Kuchling} +\authoraddress{ + \strong{Python Software Foundation}\\ + Email: \email{amk@amk.ca} +} +\begin{document} +\maketitle\tableofcontents + +\section{Introduction} + +This article explains the new features in Python 2.1. While there aren't as +many changes in 2.1 as there were in Python 2.0, there are still some +pleasant surprises in store. 2.1 is the first release to be steered +through the use of Python Enhancement Proposals, or PEPs, so most of +the sizable changes have accompanying PEPs that provide more complete +documentation and a design rationale for the change. This article +doesn't attempt to document the new features completely, but simply +provides an overview of the new features for Python programmers. +Refer to the Python 2.1 documentation, or to the specific PEP, for +more details about any new feature that particularly interests you. + +One recent goal of the Python development team has been to accelerate +the pace of new releases, with a new release coming every 6 to 9 +months. 2.1 is the first release to come out at this faster pace, with +the first alpha appearing in January, 3 months after the final version +of 2.0 was released. + +The final release of Python 2.1 was made on April 17, 2001. + +%====================================================================== +\section{PEP 227: Nested Scopes} + +The largest change in Python 2.1 is to Python's scoping rules. In +Python 2.0, at any given time there are at most three namespaces used +to look up variable names: local, module-level, and the built-in +namespace. This often surprised people because it didn't match their +intuitive expectations. For example, a nested recursive function +definition doesn't work: + +\begin{verbatim} +def f(): + ... + def g(value): + ... + return g(value-1) + 1 + ... +\end{verbatim} + +The function \function{g()} will always raise a \exception{NameError} +exception, because the binding of the name \samp{g} isn't in either +its local namespace or in the module-level namespace. This isn't much +of a problem in practice (how often do you recursively define interior +functions like this?), but this also made using the \keyword{lambda} +statement clumsier, and this was a problem in practice. In code which +uses \keyword{lambda} you can often find local variables being copied +by passing them as the default values of arguments. + +\begin{verbatim} +def find(self, name): + "Return list of any entries equal to 'name'" + L = filter(lambda x, name=name: x == name, + self.list_attribute) + return L +\end{verbatim} + +The readability of Python code written in a strongly functional style +suffers greatly as a result. + +The most significant change to Python 2.1 is that static scoping has +been added to the language to fix this problem. As a first effect, +the \code{name=name} default argument is now unnecessary in the above +example. Put simply, when a given variable name is not assigned a +value within a function (by an assignment, or the \keyword{def}, +\keyword{class}, or \keyword{import} statements), references to the +variable will be looked up in the local namespace of the enclosing +scope. A more detailed explanation of the rules, and a dissection of +the implementation, can be found in the PEP. + +This change may cause some compatibility problems for code where the +same variable name is used both at the module level and as a local +variable within a function that contains further function definitions. +This seems rather unlikely though, since such code would have been +pretty confusing to read in the first place. + +One side effect of the change is that the \code{from \var{module} +import *} and \keyword{exec} statements have been made illegal inside +a function scope under certain conditions. The Python reference +manual has said all along that \code{from \var{module} import *} is +only legal at the top level of a module, but the CPython interpreter +has never enforced this before. As part of the implementation of +nested scopes, the compiler which turns Python source into bytecodes +has to generate different code to access variables in a containing +scope. \code{from \var{module} import *} and \keyword{exec} make it +impossible for the compiler to figure this out, because they add names +to the local namespace that are unknowable at compile time. +Therefore, if a function contains function definitions or +\keyword{lambda} expressions with free variables, the compiler will +flag this by raising a \exception{SyntaxError} exception. + +To make the preceding explanation a bit clearer, here's an example: + +\begin{verbatim} +x = 1 +def f(): + # The next line is a syntax error + exec 'x=2' + def g(): + return x +\end{verbatim} + +Line 4 containing the \keyword{exec} statement is a syntax error, +since \keyword{exec} would define a new local variable named \samp{x} +whose value should be accessed by \function{g()}. + +This shouldn't be much of a limitation, since \keyword{exec} is rarely +used in most Python code (and when it is used, it's often a sign of a +poor design anyway). + +Compatibility concerns have led to nested scopes being introduced +gradually; in Python 2.1, they aren't enabled by default, but can be +turned on within a module by using a future statement as described in +PEP 236. (See the following section for further discussion of PEP +236.) In Python 2.2, nested scopes will become the default and there +will be no way to turn them off, but users will have had all of 2.1's +lifetime to fix any breakage resulting from their introduction. + +\begin{seealso} + +\seepep{227}{Statically Nested Scopes}{Written and implemented by +Jeremy Hylton.} + +\end{seealso} + + +%====================================================================== +\section{PEP 236: __future__ Directives} + +The reaction to nested scopes was widespread concern about the dangers +of breaking code with the 2.1 release, and it was strong enough to +make the Pythoneers take a more conservative approach. This approach +consists of introducing a convention for enabling optional +functionality in release N that will become compulsory in release N+1. + +The syntax uses a \code{from...import} statement using the reserved +module name \module{__future__}. Nested scopes can be enabled by the +following statement: + +\begin{verbatim} +from __future__ import nested_scopes +\end{verbatim} + +While it looks like a normal \keyword{import} statement, it's not; +there are strict rules on where such a future statement can be put. +They can only be at the top of a module, and must precede any Python +code or regular \keyword{import} statements. This is because such +statements can affect how the Python bytecode compiler parses code and +generates bytecode, so they must precede any statement that will +result in bytecodes being produced. + +\begin{seealso} + +\seepep{236}{Back to the \module{__future__}}{Written by Tim Peters, +and primarily implemented by Jeremy Hylton.} + +\end{seealso} + +%====================================================================== +\section{PEP 207: Rich Comparisons} + +In earlier versions, Python's support for implementing comparisons on +user-defined classes and extension types was quite simple. Classes +could implement a \method{__cmp__} method that was given two instances +of a class, and could only return 0 if they were equal or +1 or -1 if +they weren't; the method couldn't raise an exception or return +anything other than a Boolean value. Users of Numeric Python often +found this model too weak and restrictive, because in the +number-crunching programs that numeric Python is used for, it would be +more useful to be able to perform elementwise comparisons of two +matrices, returning a matrix containing the results of a given +comparison for each element. If the two matrices are of different +sizes, then the compare has to be able to raise an exception to signal +the error. + +In Python 2.1, rich comparisons were added in order to support this +need. Python classes can now individually overload each of the +\code{<}, \code{<=}, \code{>}, \code{>=}, \code{==}, and \code{!=} +operations. The new magic method names are: + +\begin{tableii}{c|l}{code}{Operation}{Method name} + \lineii{<}{\method{__lt__}} \lineii{<=}{\method{__le__}} + \lineii{>}{\method{__gt__}} \lineii{>=}{\method{__ge__}} + \lineii{==}{\method{__eq__}} \lineii{!=}{\method{__ne__}} + \end{tableii} + +(The magic methods are named after the corresponding Fortran operators +\code{.LT.}. \code{.LE.}, \&c. Numeric programmers are almost +certainly quite familiar with these names and will find them easy to +remember.) + +Each of these magic methods is of the form \code{\var{method}(self, +other)}, where \code{self} will be the object on the left-hand side of +the operator, while \code{other} will be the object on the right-hand +side. For example, the expression \code{A < B} will cause +\code{A.__lt__(B)} to be called. + +Each of these magic methods can return anything at all: a Boolean, a +matrix, a list, or any other Python object. Alternatively they can +raise an exception if the comparison is impossible, inconsistent, or +otherwise meaningless. + +The built-in \function{cmp(A,B)} function can use the rich comparison +machinery, and now accepts an optional argument specifying which +comparison operation to use; this is given as one of the strings +\code{"<"}, \code{"<="}, \code{">"}, \code{">="}, \code{"=="}, or +\code{"!="}. If called without the optional third argument, +\function{cmp()} will only return -1, 0, or +1 as in previous versions +of Python; otherwise it will call the appropriate method and can +return any Python object. + +There are also corresponding changes of interest to C programmers; +there's a new slot \code{tp_richcmp} in type objects and an API for +performing a given rich comparison. I won't cover the C API here, but +will refer you to PEP 207, or to 2.1's C API documentation, for the +full list of related functions. + +\begin{seealso} + +\seepep{207}{Rich Comparisions}{Written by Guido van Rossum, heavily +based on earlier work by David Ascher, and implemented by Guido van +Rossum.} + +\end{seealso} + +%====================================================================== +\section{PEP 230: Warning Framework} + +Over its 10 years of existence, Python has accumulated a certain +number of obsolete modules and features along the way. It's difficult +to know when a feature is safe to remove, since there's no way of +knowing how much code uses it --- perhaps no programs depend on the +feature, or perhaps many do. To enable removing old features in a +more structured way, a warning framework was added. When the Python +developers want to get rid of a feature, it will first trigger a +warning in the next version of Python. The following Python version +can then drop the feature, and users will have had a full release +cycle to remove uses of the old feature. + +Python 2.1 adds the warning framework to be used in this scheme. It +adds a \module{warnings} module that provide functions to issue +warnings, and to filter out warnings that you don't want to be +displayed. Third-party modules can also use this framework to +deprecate old features that they no longer wish to support. + +For example, in Python 2.1 the \module{regex} module is deprecated, so +importing it causes a warning to be printed: + +\begin{verbatim} +>>> import regex +__main__:1: DeprecationWarning: the regex module + is deprecated; please use the re module +>>> +\end{verbatim} + +Warnings can be issued by calling the \function{warnings.warn} +function: + +\begin{verbatim} +warnings.warn("feature X no longer supported") +\end{verbatim} + +The first parameter is the warning message; an additional optional +parameters can be used to specify a particular warning category. + +Filters can be added to disable certain warnings; a regular expression +pattern can be applied to the message or to the module name in order +to suppress a warning. For example, you may have a program that uses +the \module{regex} module and not want to spare the time to convert it +to use the \module{re} module right now. The warning can be +suppressed by calling + +\begin{verbatim} +import warnings +warnings.filterwarnings(action = 'ignore', + message='.*regex module is deprecated', + category=DeprecationWarning, + module = '__main__') +\end{verbatim} + +This adds a filter that will apply only to warnings of the class +\class{DeprecationWarning} triggered in the \module{__main__} module, +and applies a regular expression to only match the message about the +\module{regex} module being deprecated, and will cause such warnings +to be ignored. Warnings can also be printed only once, printed every +time the offending code is executed, or turned into exceptions that +will cause the program to stop (unless the exceptions are caught in +the usual way, of course). + +Functions were also added to Python's C API for issuing warnings; +refer to PEP 230 or to Python's API documentation for the details. + +\begin{seealso} + +\seepep{5}{Guidelines for Language Evolution}{Written +by Paul Prescod, to specify procedures to be followed when removing +old features from Python. The policy described in this PEP hasn't +been officially adopted, but the eventual policy probably won't be too +different from Prescod's proposal.} + +\seepep{230}{Warning Framework}{Written and implemented by Guido van +Rossum.} + +\end{seealso} + +%====================================================================== +\section{PEP 229: New Build System} + +When compiling Python, the user had to go in and edit the +\file{Modules/Setup} file in order to enable various additional +modules; the default set is relatively small and limited to modules +that compile on most \UNIX{} platforms. This means that on \Unix{} +platforms with many more features, most notably Linux, Python +installations often don't contain all useful modules they could. + +Python 2.0 added the Distutils, a set of modules for distributing and +installing extensions. In Python 2.1, the Distutils are used to +compile much of the standard library of extension modules, +autodetecting which ones are supported on the current machine. It's +hoped that this will make Python installations easier and more +featureful. + +Instead of having to edit the \file{Modules/Setup} file in order to +enable modules, a \file{setup.py} script in the top directory of the +Python source distribution is run at build time, and attempts to +discover which modules can be enabled by examining the modules and +header files on the system. If a module is configured in +\file{Modules/Setup}, the \file{setup.py} script won't attempt to +compile that module and will defer to the \file{Modules/Setup} file's +contents. This provides a way to specific any strange command-line +flags or libraries that are required for a specific platform. + +In another far-reaching change to the build mechanism, Neil +Schemenauer restructured things so Python now uses a single makefile +that isn't recursive, instead of makefiles in the top directory and in +each of the \file{Python/}, \file{Parser/}, \file{Objects/}, and +\file{Modules/} subdirectories. This makes building Python faster +and also makes hacking the Makefiles clearer and simpler. + +\begin{seealso} + +\seepep{229}{Using Distutils to Build Python}{Written +and implemented by A.M. Kuchling.} + +\end{seealso} + +%====================================================================== +\section{PEP 205: Weak References} + +Weak references, available through the \module{weakref} module, are a +minor but useful new data type in the Python programmer's toolbox. + +Storing a reference to an object (say, in a dictionary or a list) has +the side effect of keeping that object alive forever. There are a few +specific cases where this behaviour is undesirable, object caches +being the most common one, and another being circular references in +data structures such as trees. + +For example, consider a memoizing function that caches the results of +another function \function{f(\var{x})} by storing the function's +argument and its result in a dictionary: + +\begin{verbatim} +_cache = {} +def memoize(x): + if _cache.has_key(x): + return _cache[x] + + retval = f(x) + + # Cache the returned object + _cache[x] = retval + + return retval +\end{verbatim} + +This version works for simple things such as integers, but it has a +side effect; the \code{_cache} dictionary holds a reference to the +return values, so they'll never be deallocated until the Python +process exits and cleans up This isn't very noticeable for integers, +but if \function{f()} returns an object, or a data structure that +takes up a lot of memory, this can be a problem. + +Weak references provide a way to implement a cache that won't keep +objects alive beyond their time. If an object is only accessible +through weak references, the object will be deallocated and the weak +references will now indicate that the object it referred to no longer +exists. A weak reference to an object \var{obj} is created by calling +\code{wr = weakref.ref(\var{obj})}. The object being referred to is +returned by calling the weak reference as if it were a function: +\code{wr()}. It will return the referenced object, or \code{None} if +the object no longer exists. + +This makes it possible to write a \function{memoize()} function whose +cache doesn't keep objects alive, by storing weak references in the +cache. + +\begin{verbatim} +_cache = {} +def memoize(x): + if _cache.has_key(x): + obj = _cache[x]() + # If weak reference object still exists, + # return it + if obj is not None: return obj + + retval = f(x) + + # Cache a weak reference + _cache[x] = weakref.ref(retval) + + return retval +\end{verbatim} + +The \module{weakref} module also allows creating proxy objects which +behave like weak references --- an object referenced only by proxy +objects is deallocated -- but instead of requiring an explicit call to +retrieve the object, the proxy transparently forwards all operations +to the object as long as the object still exists. If the object is +deallocated, attempting to use a proxy will cause a +\exception{weakref.ReferenceError} exception to be raised. + +\begin{verbatim} +proxy = weakref.proxy(obj) +proxy.attr # Equivalent to obj.attr +proxy.meth() # Equivalent to obj.meth() +del obj +proxy.attr # raises weakref.ReferenceError +\end{verbatim} + +\begin{seealso} + +\seepep{205}{Weak References}{Written and implemented by +Fred~L. Drake,~Jr.} + +\end{seealso} + +%====================================================================== +\section{PEP 232: Function Attributes} + +In Python 2.1, functions can now have arbitrary information attached +to them. People were often using docstrings to hold information about +functions and methods, because the \code{__doc__} attribute was the +only way of attaching any information to a function. For example, in +the Zope Web application server, functions are marked as safe for +public access by having a docstring, and in John Aycock's SPARK +parsing framework, docstrings hold parts of the BNF grammar to be +parsed. This overloading is unfortunate, since docstrings are really +intended to hold a function's documentation; for example, it means you +can't properly document functions intended for private use in Zope. + +Arbitrary attributes can now be set and retrieved on functions using the +regular Python syntax: + +\begin{verbatim} +def f(): pass + +f.publish = 1 +f.secure = 1 +f.grammar = "A ::= B (C D)*" +\end{verbatim} + +The dictionary containing attributes can be accessed as the function's +\member{__dict__}. Unlike the \member{__dict__} attribute of class +instances, in functions you can actually assign a new dictionary to +\member{__dict__}, though the new value is restricted to a regular +Python dictionary; you \emph{can't} be tricky and set it to a +\class{UserDict} instance, or any other random object that behaves +like a mapping. + +\begin{seealso} + +\seepep{232}{Function Attributes}{Written and implemented by Barry +Warsaw.} + +\end{seealso} + + +%====================================================================== + +\section{PEP 235: Importing Modules on Case-Insensitive Platforms} + +Some operating systems have filesystems that are case-insensitive, +MacOS and Windows being the primary examples; on these systems, it's +impossible to distinguish the filenames \samp{FILE.PY} and +\samp{file.py}, even though they do store the file's name +in its original case (they're case-preserving, too). + +In Python 2.1, the \keyword{import} statement will work to simulate +case-sensitivity on case-insensitive platforms. Python will now +search for the first case-sensitive match by default, raising an +\exception{ImportError} if no such file is found, so \code{import file} +will not import a module named \samp{FILE.PY}. Case-insensitive +matching can be requested by setting the \envvar{PYTHONCASEOK} environment +variable before starting the Python interpreter. + +%====================================================================== +\section{PEP 217: Interactive Display Hook} + +When using the Python interpreter interactively, the output of +commands is displayed using the built-in \function{repr()} function. +In Python 2.1, the variable \function{sys.displayhook} can be set to a +callable object which will be called instead of \function{repr()}. +For example, you can set it to a special pretty-printing function: + +\begin{verbatim} +>>> # Create a recursive data structure +... L = [1,2,3] +>>> L.append(L) +>>> L # Show Python's default output +[1, 2, 3, [...]] +>>> # Use pprint.pprint() as the display function +... import sys, pprint +>>> sys.displayhook = pprint.pprint +>>> L +[1, 2, 3, <Recursion on list with id=135143996>] +>>> +\end{verbatim} + +\begin{seealso} + +\seepep{217}{Display Hook for Interactive Use}{Written and implemented +by Moshe Zadka.} + +\end{seealso} + +%====================================================================== +\section{PEP 208: New Coercion Model} + +How numeric coercion is done at the C level was significantly +modified. This will only affect the authors of C extensions to +Python, allowing them more flexibility in writing extension types that +support numeric operations. + +Extension types can now set the type flag \code{Py_TPFLAGS_CHECKTYPES} +in their \code{PyTypeObject} structure to indicate that they support +the new coercion model. In such extension types, the numeric slot +functions can no longer assume that they'll be passed two arguments of +the same type; instead they may be passed two arguments of differing +types, and can then perform their own internal coercion. If the slot +function is passed a type it can't handle, it can indicate the failure +by returning a reference to the \code{Py_NotImplemented} singleton +value. The numeric functions of the other type will then be tried, +and perhaps they can handle the operation; if the other type also +returns \code{Py_NotImplemented}, then a \exception{TypeError} will be +raised. Numeric methods written in Python can also return +\code{Py_NotImplemented}, causing the interpreter to act as if the +method did not exist (perhaps raising a \exception{TypeError}, perhaps +trying another object's numeric methods). + +\begin{seealso} + +\seepep{208}{Reworking the Coercion Model}{Written and implemented by +Neil Schemenauer, heavily based upon earlier work by Marc-Andr\'e +Lemburg. Read this to understand the fine points of how numeric +operations will now be processed at the C level.} + +\end{seealso} + +%====================================================================== +\section{PEP 241: Metadata in Python Packages} + +A common complaint from Python users is that there's no single catalog +of all the Python modules in existence. T.~Middleton's Vaults of +Parnassus at \url{http://www.vex.net/parnassus/} are the largest +catalog of Python modules, but registering software at the Vaults is +optional, and many people don't bother. + +As a first small step toward fixing the problem, Python software +packaged using the Distutils \command{sdist} command will include a +file named \file{PKG-INFO} containing information about the package +such as its name, version, and author (metadata, in cataloguing +terminology). PEP 241 contains the full list of fields that can be +present in the \file{PKG-INFO} file. As people began to package their +software using Python 2.1, more and more packages will include +metadata, making it possible to build automated cataloguing systems +and experiment with them. With the result experience, perhaps it'll +be possible to design a really good catalog and then build support for +it into Python 2.2. For example, the Distutils \command{sdist} +and \command{bdist_*} commands could support a \option{upload} option +that would automatically upload your package to a catalog server. + +You can start creating packages containing \file{PKG-INFO} even if +you're not using Python 2.1, since a new release of the Distutils will +be made for users of earlier Python versions. Version 1.0.2 of the +Distutils includes the changes described in PEP 241, as well as +various bugfixes and enhancements. It will be available from +the Distutils SIG at \url{http://www.python.org/sigs/distutils-sig/}. + +\begin{seealso} + +\seepep{241}{Metadata for Python Software Packages}{Written and +implemented by A.M. Kuchling.} + +\seepep{243}{Module Repository Upload Mechanism}{Written by Sean +Reifschneider, this draft PEP describes a proposed mechanism for uploading +Python packages to a central server. +} + +\end{seealso} + +%====================================================================== +\section{New and Improved Modules} + +\begin{itemize} + +\item Ka-Ping Yee contributed two new modules: \module{inspect.py}, a +module for getting information about live Python code, and +\module{pydoc.py}, a module for interactively converting docstrings to +HTML or text. As a bonus, \file{Tools/scripts/pydoc}, which is now +automatically installed, uses \module{pydoc.py} to display +documentation given a Python module, package, or class name. For +example, \samp{pydoc xml.dom} displays the following: + +\begin{verbatim} +Python Library Documentation: package xml.dom in xml + +NAME + xml.dom - W3C Document Object Model implementation for Python. + +FILE + /usr/local/lib/python2.1/xml/dom/__init__.pyc + +DESCRIPTION + The Python mapping of the Document Object Model is documented in the + Python Library Reference in the section on the xml.dom package. + + This package contains the following modules: + ... +\end{verbatim} + +\file{pydoc} also includes a Tk-based interactive help browser. +\file{pydoc} quickly becomes addictive; try it out! + +\item Two different modules for unit testing were added to the +standard library. The \module{doctest} module, contributed by Tim +Peters, provides a testing framework based on running embedded +examples in docstrings and comparing the results against the expected +output. PyUnit, contributed by Steve Purcell, is a unit testing +framework inspired by JUnit, which was in turn an adaptation of Kent +Beck's Smalltalk testing framework. See +\url{http://pyunit.sourceforge.net/} for more information about +PyUnit. + +\item The \module{difflib} module contains a class, +\class{SequenceMatcher}, which compares two sequences and computes the +changes required to transform one sequence into the other. For +example, this module can be used to write a tool similar to the \UNIX{} +\program{diff} program, and in fact the sample program +\file{Tools/scripts/ndiff.py} demonstrates how to write such a script. + +\item \module{curses.panel}, a wrapper for the panel library, part of +ncurses and of SYSV curses, was contributed by Thomas Gellekum. The +panel library provides windows with the additional feature of depth. +Windows can be moved higher or lower in the depth ordering, and the +panel library figures out where panels overlap and which sections are +visible. + +\item The PyXML package has gone through a few releases since Python +2.0, and Python 2.1 includes an updated version of the \module{xml} +package. Some of the noteworthy changes include support for Expat 1.2 +and later versions, the ability for Expat parsers to handle files in +any encoding supported by Python, and various bugfixes for SAX, DOM, +and the \module{minidom} module. + +\item Ping also contributed another hook for handling uncaught +exceptions. \function{sys.excepthook} can be set to a callable +object. When an exception isn't caught by any +\keyword{try}...\keyword{except} blocks, the exception will be passed +to \function{sys.excepthook}, which can then do whatever it likes. At +the Ninth Python Conference, Ping demonstrated an application for this +hook: printing an extended traceback that not only lists the stack +frames, but also lists the function arguments and the local variables +for each frame. + +\item Various functions in the \module{time} module, such as +\function{asctime()} and \function{localtime()}, require a floating +point argument containing the time in seconds since the epoch. The +most common use of these functions is to work with the current time, +so the floating point argument has been made optional; when a value +isn't provided, the current time will be used. For example, log file +entries usually need a string containing the current time; in Python +2.1, \code{time.asctime()} can be used, instead of the lengthier +\code{time.asctime(time.localtime(time.time()))} that was previously +required. + +This change was proposed and implemented by Thomas Wouters. + +\item The \module{ftplib} module now defaults to retrieving files in +passive mode, because passive mode is more likely to work from behind +a firewall. This request came from the Debian bug tracking system, +since other Debian packages use \module{ftplib} to retrieve files and +then don't work from behind a firewall. It's deemed unlikely that +this will cause problems for anyone, because Netscape defaults to +passive mode and few people complain, but if passive mode is +unsuitable for your application or network setup, call +\method{set_pasv(0)} on FTP objects to disable passive mode. + +\item Support for raw socket access has been added to the +\module{socket} module, contributed by Grant Edwards. + +\item The \module{pstats} module now contains a simple interactive +statistics browser for displaying timing profiles for Python programs, +invoked when the module is run as a script. Contributed by +Eric S.\ Raymond. + +\item A new implementation-dependent function, \function{sys._getframe(\optional{depth})}, +has been added to return a given frame object from the current call stack. +\function{sys._getframe()} returns the frame at the top of the call stack; +if the optional integer argument \var{depth} is supplied, the function returns the frame +that is \var{depth} calls below the top of the stack. For example, \code{sys._getframe(1)} +returns the caller's frame object. + +This function is only present in CPython, not in Jython or the .NET +implementation. Use it for debugging, and resist the temptation to +put it into production code. + + + +\end{itemize} + +%====================================================================== +\section{Other Changes and Fixes} + +There were relatively few smaller changes made in Python 2.1 due to +the shorter release cycle. A search through the CVS change logs turns +up 117 patches applied, and 136 bugs fixed; both figures are likely to +be underestimates. Some of the more notable changes are: + +\begin{itemize} + + +\item A specialized object allocator is now optionally available, that +should be faster than the system \function{malloc()} and have less +memory overhead. The allocator uses C's \function{malloc()} function +to get large pools of memory, and then fulfills smaller memory +requests from these pools. It can be enabled by providing the +\longprogramopt{with-pymalloc} option to the \program{configure} script; see +\file{Objects/obmalloc.c} for the implementation details. + +Authors of C extension modules should test their code with the object +allocator enabled, because some incorrect code may break, causing core +dumps at runtime. There are a bunch of memory allocation functions in +Python's C API that have previously been just aliases for the C +library's \function{malloc()} and \function{free()}, meaning that if +you accidentally called mismatched functions, the error wouldn't be +noticeable. When the object allocator is enabled, these functions +aren't aliases of \function{malloc()} and \function{free()} any more, +and calling the wrong function to free memory will get you a core +dump. For example, if memory was allocated using +\function{PyMem_New()}, it has to be freed using +\function{PyMem_Del()}, not \function{free()}. A few modules included +with Python fell afoul of this and had to be fixed; doubtless there +are more third-party modules that will have the same problem. + +The object allocator was contributed by Vladimir Marangozov. + +\item The speed of line-oriented file I/O has been improved because +people often complain about its lack of speed, and because it's often +been used as a na\"ive benchmark. The \method{readline()} method of +file objects has therefore been rewritten to be much faster. The +exact amount of the speedup will vary from platform to platform +depending on how slow the C library's \function{getc()} was, but is +around 66\%, and potentially much faster on some particular operating +systems. Tim Peters did much of the benchmarking and coding for this +change, motivated by a discussion in comp.lang.python. + +A new module and method for file objects was also added, contributed +by Jeff Epler. The new method, \method{xreadlines()}, is similar to +the existing \function{xrange()} built-in. \function{xreadlines()} +returns an opaque sequence object that only supports being iterated +over, reading a line on every iteration but not reading the entire +file into memory as the existing \method{readlines()} method does. +You'd use it like this: + +\begin{verbatim} +for line in sys.stdin.xreadlines(): + # ... do something for each line ... + ... +\end{verbatim} + +For a fuller discussion of the line I/O changes, see the python-dev +summary for January 1-15, 2001 at +\url{http://www.python.org/dev/summary/2001-01-1.html}. + +\item A new method, \method{popitem()}, was added to dictionaries to +enable destructively iterating through the contents of a dictionary; +this can be faster for large dictionaries because there's no need to +construct a list containing all the keys or values. +\code{D.popitem()} removes a random \code{(\var{key}, \var{value})} +pair from the dictionary~\code{D} and returns it as a 2-tuple. This +was implemented mostly by Tim Peters and Guido van Rossum, after a +suggestion and preliminary patch by Moshe Zadka. + +\item Modules can now control which names are imported when \code{from +\var{module} import *} is used, by defining an \code{__all__} +attribute containing a list of names that will be imported. One +common complaint is that if the module imports other modules such as +\module{sys} or \module{string}, \code{from \var{module} import *} +will add them to the importing module's namespace. To fix this, +simply list the public names in \code{__all__}: + +\begin{verbatim} +# List public names +__all__ = ['Database', 'open'] +\end{verbatim} + +A stricter version of this patch was first suggested and implemented +by Ben Wolfson, but after some python-dev discussion, a weaker final +version was checked in. + +\item Applying \function{repr()} to strings previously used octal +escapes for non-printable characters; for example, a newline was +\code{'\e 012'}. This was a vestigial trace of Python's C ancestry, but +today octal is of very little practical use. Ka-Ping Yee suggested +using hex escapes instead of octal ones, and using the \code{\e n}, +\code{\e t}, \code{\e r} escapes for the appropriate characters, and +implemented this new formatting. + +\item Syntax errors detected at compile-time can now raise exceptions +containing the filename and line number of the error, a pleasant side +effect of the compiler reorganization done by Jeremy Hylton. + +\item C extensions which import other modules have been changed to use +\function{PyImport_ImportModule()}, which means that they will use any +import hooks that have been installed. This is also encouraged for +third-party extensions that need to import some other module from C +code. + +\item The size of the Unicode character database was shrunk by another +340K thanks to Fredrik Lundh. + +\item Some new ports were contributed: MacOS X (by Steven Majewski), +Cygwin (by Jason Tishler); RISCOS (by Dietmar Schwertberger); Unixware~7 +(by Billy G. Allie). + +\end{itemize} + +And there's the usual list of minor bugfixes, minor memory leaks, +docstring edits, and other tweaks, too lengthy to be worth itemizing; +see the CVS logs for the full details if you want them. + + +%====================================================================== +\section{Acknowledgements} + +The author would like to thank the following people for offering +suggestions on various drafts of this article: Graeme Cross, David +Goodger, Jay Graves, Michael Hudson, Marc-Andr\'e Lemburg, Fredrik +Lundh, Neil Schemenauer, Thomas Wouters. + +\end{document} diff --git a/sys/src/cmd/python/Doc/whatsnew/whatsnew22.tex b/sys/src/cmd/python/Doc/whatsnew/whatsnew22.tex new file mode 100644 index 000000000..25b347759 --- /dev/null +++ b/sys/src/cmd/python/Doc/whatsnew/whatsnew22.tex @@ -0,0 +1,1466 @@ +\documentclass{howto} + +% $Id: whatsnew22.tex 37315 2004-09-10 19:33:00Z akuchling $ + +\title{What's New in Python 2.2} +\release{1.02} +\author{A.M. Kuchling} +\authoraddress{ + \strong{Python Software Foundation}\\ + Email: \email{amk@amk.ca} +} +\begin{document} +\maketitle\tableofcontents + +\section{Introduction} + +This article explains the new features in Python 2.2.2, released on +October 14, 2002. Python 2.2.2 is a bugfix release of Python 2.2, +originally released on December 21, 2001. + +Python 2.2 can be thought of as the "cleanup release". There are some +features such as generators and iterators that are completely new, but +most of the changes, significant and far-reaching though they may be, +are aimed at cleaning up irregularities and dark corners of the +language design. + +This article doesn't attempt to provide a complete specification of +the new features, but instead provides a convenient overview. For +full details, you should refer to the documentation for Python 2.2, +such as the +\citetitle[http://www.python.org/doc/2.2/lib/lib.html]{Python +Library Reference} and the +\citetitle[http://www.python.org/doc/2.2/ref/ref.html]{Python +Reference Manual}. If you want to understand the complete +implementation and design rationale for a change, refer to the PEP for +a particular new feature. + +\begin{seealso} + +\seeurl{http://www.unixreview.com/documents/s=1356/urm0109h/0109h.htm} +{``What's So Special About Python 2.2?'' is also about the new 2.2 +features, and was written by Cameron Laird and Kathryn Soraiz.} + +\end{seealso} + + +%====================================================================== +\section{PEPs 252 and 253: Type and Class Changes} + +The largest and most far-reaching changes in Python 2.2 are to +Python's model of objects and classes. The changes should be backward +compatible, so it's likely that your code will continue to run +unchanged, but the changes provide some amazing new capabilities. +Before beginning this, the longest and most complicated section of +this article, I'll provide an overview of the changes and offer some +comments. + +A long time ago I wrote a Web page +(\url{http://www.amk.ca/python/writing/warts.html}) listing flaws in +Python's design. One of the most significant flaws was that it's +impossible to subclass Python types implemented in C. In particular, +it's not possible to subclass built-in types, so you can't just +subclass, say, lists in order to add a single useful method to them. +The \module{UserList} module provides a class that supports all of the +methods of lists and that can be subclassed further, but there's lots +of C code that expects a regular Python list and won't accept a +\class{UserList} instance. + +Python 2.2 fixes this, and in the process adds some exciting new +capabilities. A brief summary: + +\begin{itemize} + +\item You can subclass built-in types such as lists and even integers, +and your subclasses should work in every place that requires the +original type. + +\item It's now possible to define static and class methods, in addition +to the instance methods available in previous versions of Python. + +\item It's also possible to automatically call methods on accessing or +setting an instance attribute by using a new mechanism called +\dfn{properties}. Many uses of \method{__getattr__} can be rewritten +to use properties instead, making the resulting code simpler and +faster. As a small side benefit, attributes can now have docstrings, +too. + +\item The list of legal attributes for an instance can be limited to a +particular set using \dfn{slots}, making it possible to safeguard +against typos and perhaps make more optimizations possible in future +versions of Python. + +\end{itemize} + +Some users have voiced concern about all these changes. Sure, they +say, the new features are neat and lend themselves to all sorts of +tricks that weren't possible in previous versions of Python, but +they also make the language more complicated. Some people have said +that they've always recommended Python for its simplicity, and feel +that its simplicity is being lost. + +Personally, I think there's no need to worry. Many of the new +features are quite esoteric, and you can write a lot of Python code +without ever needed to be aware of them. Writing a simple class is no +more difficult than it ever was, so you don't need to bother learning +or teaching them unless they're actually needed. Some very +complicated tasks that were previously only possible from C will now +be possible in pure Python, and to my mind that's all for the better. + +I'm not going to attempt to cover every single corner case and small +change that were required to make the new features work. Instead this +section will paint only the broad strokes. See section~\ref{sect-rellinks}, +``Related Links'', for further sources of information about Python 2.2's new +object model. + + +\subsection{Old and New Classes} + +First, you should know that Python 2.2 really has two kinds of +classes: classic or old-style classes, and new-style classes. The +old-style class model is exactly the same as the class model in +earlier versions of Python. All the new features described in this +section apply only to new-style classes. This divergence isn't +intended to last forever; eventually old-style classes will be +dropped, possibly in Python 3.0. + +So how do you define a new-style class? You do it by subclassing an +existing new-style class. Most of Python's built-in types, such as +integers, lists, dictionaries, and even files, are new-style classes +now. A new-style class named \class{object}, the base class for all +built-in types, has also been added so if no built-in type is +suitable, you can just subclass \class{object}: + +\begin{verbatim} +class C(object): + def __init__ (self): + ... + ... +\end{verbatim} + +This means that \keyword{class} statements that don't have any base +classes are always classic classes in Python 2.2. (Actually you can +also change this by setting a module-level variable named +\member{__metaclass__} --- see \pep{253} for the details --- but it's +easier to just subclass \keyword{object}.) + +The type objects for the built-in types are available as built-ins, +named using a clever trick. Python has always had built-in functions +named \function{int()}, \function{float()}, and \function{str()}. In +2.2, they aren't functions any more, but type objects that behave as +factories when called. + +\begin{verbatim} +>>> int +<type 'int'> +>>> int('123') +123 +\end{verbatim} + +To make the set of types complete, new type objects such as +\function{dict} and \function{file} have been added. Here's a +more interesting example, adding a \method{lock()} method to file +objects: + +\begin{verbatim} +class LockableFile(file): + def lock (self, operation, length=0, start=0, whence=0): + import fcntl + return fcntl.lockf(self.fileno(), operation, + length, start, whence) +\end{verbatim} + +The now-obsolete \module{posixfile} module contained a class that +emulated all of a file object's methods and also added a +\method{lock()} method, but this class couldn't be passed to internal +functions that expected a built-in file, something which is possible +with our new \class{LockableFile}. + + +\subsection{Descriptors} + +In previous versions of Python, there was no consistent way to +discover what attributes and methods were supported by an object. +There were some informal conventions, such as defining +\member{__members__} and \member{__methods__} attributes that were +lists of names, but often the author of an extension type or a class +wouldn't bother to define them. You could fall back on inspecting the +\member{__dict__} of an object, but when class inheritance or an +arbitrary \method{__getattr__} hook were in use this could still be +inaccurate. + +The one big idea underlying the new class model is that an API for +describing the attributes of an object using \dfn{descriptors} has +been formalized. Descriptors specify the value of an attribute, +stating whether it's a method or a field. With the descriptor API, +static methods and class methods become possible, as well as more +exotic constructs. + +Attribute descriptors are objects that live inside class objects, and +have a few attributes of their own: + +\begin{itemize} + +\item \member{__name__} is the attribute's name. + +\item \member{__doc__} is the attribute's docstring. + +\item \method{__get__(\var{object})} is a method that retrieves the +attribute value from \var{object}. + +\item \method{__set__(\var{object}, \var{value})} sets the attribute +on \var{object} to \var{value}. + +\item \method{__delete__(\var{object}, \var{value})} deletes the \var{value} +attribute of \var{object}. +\end{itemize} + +For example, when you write \code{obj.x}, the steps that Python +actually performs are: + +\begin{verbatim} +descriptor = obj.__class__.x +descriptor.__get__(obj) +\end{verbatim} + +For methods, \method{descriptor.__get__} returns a temporary object that's +callable, and wraps up the instance and the method to be called on it. +This is also why static methods and class methods are now possible; +they have descriptors that wrap up just the method, or the method and +the class. As a brief explanation of these new kinds of methods, +static methods aren't passed the instance, and therefore resemble +regular functions. Class methods are passed the class of the object, +but not the object itself. Static and class methods are defined like +this: + +\begin{verbatim} +class C(object): + def f(arg1, arg2): + ... + f = staticmethod(f) + + def g(cls, arg1, arg2): + ... + g = classmethod(g) +\end{verbatim} + +The \function{staticmethod()} function takes the function +\function{f}, and returns it wrapped up in a descriptor so it can be +stored in the class object. You might expect there to be special +syntax for creating such methods (\code{def static f()}, +\code{defstatic f()}, or something like that) but no such syntax has +been defined yet; that's been left for future versions of Python. + +More new features, such as slots and properties, are also implemented +as new kinds of descriptors, and it's not difficult to write a +descriptor class that does something novel. For example, it would be +possible to write a descriptor class that made it possible to write +Eiffel-style preconditions and postconditions for a method. A class +that used this feature might be defined like this: + +\begin{verbatim} +from eiffel import eiffelmethod + +class C(object): + def f(self, arg1, arg2): + # The actual function + ... + def pre_f(self): + # Check preconditions + ... + def post_f(self): + # Check postconditions + ... + + f = eiffelmethod(f, pre_f, post_f) +\end{verbatim} + +Note that a person using the new \function{eiffelmethod()} doesn't +have to understand anything about descriptors. This is why I think +the new features don't increase the basic complexity of the language. +There will be a few wizards who need to know about it in order to +write \function{eiffelmethod()} or the ZODB or whatever, but most +users will just write code on top of the resulting libraries and +ignore the implementation details. + + +\subsection{Multiple Inheritance: The Diamond Rule} + +Multiple inheritance has also been made more useful through changing +the rules under which names are resolved. Consider this set of classes +(diagram taken from \pep{253} by Guido van Rossum): + +\begin{verbatim} + class A: + ^ ^ def save(self): ... + / \ + / \ + / \ + / \ + class B class C: + ^ ^ def save(self): ... + \ / + \ / + \ / + \ / + class D +\end{verbatim} + +The lookup rule for classic classes is simple but not very smart; the +base classes are searched depth-first, going from left to right. A +reference to \method{D.save} will search the classes \class{D}, +\class{B}, and then \class{A}, where \method{save()} would be found +and returned. \method{C.save()} would never be found at all. This is +bad, because if \class{C}'s \method{save()} method is saving some +internal state specific to \class{C}, not calling it will result in +that state never getting saved. + +New-style classes follow a different algorithm that's a bit more +complicated to explain, but does the right thing in this situation. +(Note that Python 2.3 changes this algorithm to one that produces the +same results in most cases, but produces more useful results for +really complicated inheritance graphs.) + +\begin{enumerate} + +\item List all the base classes, following the classic lookup rule and +include a class multiple times if it's visited repeatedly. In the +above example, the list of visited classes is [\class{D}, \class{B}, +\class{A}, \class{C}, \class{A}]. + +\item Scan the list for duplicated classes. If any are found, remove +all but one occurrence, leaving the \emph{last} one in the list. In +the above example, the list becomes [\class{D}, \class{B}, \class{C}, +\class{A}] after dropping duplicates. + +\end{enumerate} + +Following this rule, referring to \method{D.save()} will return +\method{C.save()}, which is the behaviour we're after. This lookup +rule is the same as the one followed by Common Lisp. A new built-in +function, \function{super()}, provides a way to get at a class's +superclasses without having to reimplement Python's algorithm. +The most commonly used form will be +\function{super(\var{class}, \var{obj})}, which returns +a bound superclass object (not the actual class object). This form +will be used in methods to call a method in the superclass; for +example, \class{D}'s \method{save()} method would look like this: + +\begin{verbatim} +class D (B,C): + def save (self): + # Call superclass .save() + super(D, self).save() + # Save D's private information here + ... +\end{verbatim} + +\function{super()} can also return unbound superclass objects +when called as \function{super(\var{class})} or +\function{super(\var{class1}, \var{class2})}, but this probably won't +often be useful. + + +\subsection{Attribute Access} + +A fair number of sophisticated Python classes define hooks for +attribute access using \method{__getattr__}; most commonly this is +done for convenience, to make code more readable by automatically +mapping an attribute access such as \code{obj.parent} into a method +call such as \code{obj.get_parent()}. Python 2.2 adds some new ways +of controlling attribute access. + +First, \method{__getattr__(\var{attr_name})} is still supported by +new-style classes, and nothing about it has changed. As before, it +will be called when an attempt is made to access \code{obj.foo} and no +attribute named \samp{foo} is found in the instance's dictionary. + +New-style classes also support a new method, +\method{__getattribute__(\var{attr_name})}. The difference between +the two methods is that \method{__getattribute__} is \emph{always} +called whenever any attribute is accessed, while the old +\method{__getattr__} is only called if \samp{foo} isn't found in the +instance's dictionary. + +However, Python 2.2's support for \dfn{properties} will often be a +simpler way to trap attribute references. Writing a +\method{__getattr__} method is complicated because to avoid recursion +you can't use regular attribute accesses inside them, and instead have +to mess around with the contents of \member{__dict__}. +\method{__getattr__} methods also end up being called by Python when +it checks for other methods such as \method{__repr__} or +\method{__coerce__}, and so have to be written with this in mind. +Finally, calling a function on every attribute access results in a +sizable performance loss. + +\class{property} is a new built-in type that packages up three +functions that get, set, or delete an attribute, and a docstring. For +example, if you want to define a \member{size} attribute that's +computed, but also settable, you could write: + +\begin{verbatim} +class C(object): + def get_size (self): + result = ... computation ... + return result + def set_size (self, size): + ... compute something based on the size + and set internal state appropriately ... + + # Define a property. The 'delete this attribute' + # method is defined as None, so the attribute + # can't be deleted. + size = property(get_size, set_size, + None, + "Storage size of this instance") +\end{verbatim} + +That is certainly clearer and easier to write than a pair of +\method{__getattr__}/\method{__setattr__} methods that check for the +\member{size} attribute and handle it specially while retrieving all +other attributes from the instance's \member{__dict__}. Accesses to +\member{size} are also the only ones which have to perform the work of +calling a function, so references to other attributes run at +their usual speed. + +Finally, it's possible to constrain the list of attributes that can be +referenced on an object using the new \member{__slots__} class attribute. +Python objects are usually very dynamic; at any time it's possible to +define a new attribute on an instance by just doing +\code{obj.new_attr=1}. A new-style class can define a class attribute named +\member{__slots__} to limit the legal attributes +to a particular set of names. An example will make this clear: + +\begin{verbatim} +>>> class C(object): +... __slots__ = ('template', 'name') +... +>>> obj = C() +>>> print obj.template +None +>>> obj.template = 'Test' +>>> print obj.template +Test +>>> obj.newattr = None +Traceback (most recent call last): + File "<stdin>", line 1, in ? +AttributeError: 'C' object has no attribute 'newattr' +\end{verbatim} + +Note how you get an \exception{AttributeError} on the attempt to +assign to an attribute not listed in \member{__slots__}. + + + +\subsection{Related Links} +\label{sect-rellinks} + +This section has just been a quick overview of the new features, +giving enough of an explanation to start you programming, but many +details have been simplified or ignored. Where should you go to get a +more complete picture? + +\url{http://www.python.org/2.2/descrintro.html} is a lengthy tutorial +introduction to the descriptor features, written by Guido van Rossum. +If my description has whetted your appetite, go read this tutorial +next, because it goes into much more detail about the new features +while still remaining quite easy to read. + +Next, there are two relevant PEPs, \pep{252} and \pep{253}. \pep{252} +is titled "Making Types Look More Like Classes", and covers the +descriptor API. \pep{253} is titled "Subtyping Built-in Types", and +describes the changes to type objects that make it possible to subtype +built-in objects. \pep{253} is the more complicated PEP of the two, +and at a few points the necessary explanations of types and meta-types +may cause your head to explode. Both PEPs were written and +implemented by Guido van Rossum, with substantial assistance from the +rest of the Zope Corp. team. + +Finally, there's the ultimate authority: the source code. Most of the +machinery for the type handling is in \file{Objects/typeobject.c}, but +you should only resort to it after all other avenues have been +exhausted, including posting a question to python-list or python-dev. + + +%====================================================================== +\section{PEP 234: Iterators} + +Another significant addition to 2.2 is an iteration interface at both +the C and Python levels. Objects can define how they can be looped +over by callers. + +In Python versions up to 2.1, the usual way to make \code{for item in +obj} work is to define a \method{__getitem__()} method that looks +something like this: + +\begin{verbatim} + def __getitem__(self, index): + return <next item> +\end{verbatim} + +\method{__getitem__()} is more properly used to define an indexing +operation on an object so that you can write \code{obj[5]} to retrieve +the sixth element. It's a bit misleading when you're using this only +to support \keyword{for} loops. Consider some file-like object that +wants to be looped over; the \var{index} parameter is essentially +meaningless, as the class probably assumes that a series of +\method{__getitem__()} calls will be made with \var{index} +incrementing by one each time. In other words, the presence of the +\method{__getitem__()} method doesn't mean that using \code{file[5]} +to randomly access the sixth element will work, though it really should. + +In Python 2.2, iteration can be implemented separately, and +\method{__getitem__()} methods can be limited to classes that really +do support random access. The basic idea of iterators is +simple. A new built-in function, \function{iter(obj)} or +\code{iter(\var{C}, \var{sentinel})}, is used to get an iterator. +\function{iter(obj)} returns an iterator for the object \var{obj}, +while \code{iter(\var{C}, \var{sentinel})} returns an iterator that +will invoke the callable object \var{C} until it returns +\var{sentinel} to signal that the iterator is done. + +Python classes can define an \method{__iter__()} method, which should +create and return a new iterator for the object; if the object is its +own iterator, this method can just return \code{self}. In particular, +iterators will usually be their own iterators. Extension types +implemented in C can implement a \member{tp_iter} function in order to +return an iterator, and extension types that want to behave as +iterators can define a \member{tp_iternext} function. + +So, after all this, what do iterators actually do? They have one +required method, \method{next()}, which takes no arguments and returns +the next value. When there are no more values to be returned, calling +\method{next()} should raise the \exception{StopIteration} exception. + +\begin{verbatim} +>>> L = [1,2,3] +>>> i = iter(L) +>>> print i +<iterator object at 0x8116870> +>>> i.next() +1 +>>> i.next() +2 +>>> i.next() +3 +>>> i.next() +Traceback (most recent call last): + File "<stdin>", line 1, in ? +StopIteration +>>> +\end{verbatim} + +In 2.2, Python's \keyword{for} statement no longer expects a sequence; +it expects something for which \function{iter()} will return an iterator. +For backward compatibility and convenience, an iterator is +automatically constructed for sequences that don't implement +\method{__iter__()} or a \member{tp_iter} slot, so \code{for i in +[1,2,3]} will still work. Wherever the Python interpreter loops over +a sequence, it's been changed to use the iterator protocol. This +means you can do things like this: + +\begin{verbatim} +>>> L = [1,2,3] +>>> i = iter(L) +>>> a,b,c = i +>>> a,b,c +(1, 2, 3) +\end{verbatim} + +Iterator support has been added to some of Python's basic types. +Calling \function{iter()} on a dictionary will return an iterator +which loops over its keys: + +\begin{verbatim} +>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6, +... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12} +>>> for key in m: print key, m[key] +... +Mar 3 +Feb 2 +Aug 8 +Sep 9 +May 5 +Jun 6 +Jul 7 +Jan 1 +Apr 4 +Nov 11 +Dec 12 +Oct 10 +\end{verbatim} + +That's just the default behaviour. If you want to iterate over keys, +values, or key/value pairs, you can explicitly call the +\method{iterkeys()}, \method{itervalues()}, or \method{iteritems()} +methods to get an appropriate iterator. In a minor related change, +the \keyword{in} operator now works on dictionaries, so +\code{\var{key} in dict} is now equivalent to +\code{dict.has_key(\var{key})}. + +Files also provide an iterator, which calls the \method{readline()} +method until there are no more lines in the file. This means you can +now read each line of a file using code like this: + +\begin{verbatim} +for line in file: + # do something for each line + ... +\end{verbatim} + +Note that you can only go forward in an iterator; there's no way to +get the previous element, reset the iterator, or make a copy of it. +An iterator object could provide such additional capabilities, but the +iterator protocol only requires a \method{next()} method. + +\begin{seealso} + +\seepep{234}{Iterators}{Written by Ka-Ping Yee and GvR; implemented +by the Python Labs crew, mostly by GvR and Tim Peters.} + +\end{seealso} + + +%====================================================================== +\section{PEP 255: Simple Generators} + +Generators are another new feature, one that interacts with the +introduction of iterators. + +You're doubtless familiar with how function calls work in Python or +C. When you call a function, it gets a private namespace where its local +variables are created. When the function reaches a \keyword{return} +statement, the local variables are destroyed and the resulting value +is returned to the caller. A later call to the same function will get +a fresh new set of local variables. But, what if the local variables +weren't thrown away on exiting a function? What if you could later +resume the function where it left off? This is what generators +provide; they can be thought of as resumable functions. + +Here's the simplest example of a generator function: + +\begin{verbatim} +def generate_ints(N): + for i in range(N): + yield i +\end{verbatim} + +A new keyword, \keyword{yield}, was introduced for generators. Any +function containing a \keyword{yield} statement is a generator +function; this is detected by Python's bytecode compiler which +compiles the function specially as a result. Because a new keyword was +introduced, generators must be explicitly enabled in a module by +including a \code{from __future__ import generators} statement near +the top of the module's source code. In Python 2.3 this statement +will become unnecessary. + +When you call a generator function, it doesn't return a single value; +instead it returns a generator object that supports the iterator +protocol. On executing the \keyword{yield} statement, the generator +outputs the value of \code{i}, similar to a \keyword{return} +statement. The big difference between \keyword{yield} and a +\keyword{return} statement is that on reaching a \keyword{yield} the +generator's state of execution is suspended and local variables are +preserved. On the next call to the generator's \code{next()} method, +the function will resume executing immediately after the +\keyword{yield} statement. (For complicated reasons, the +\keyword{yield} statement isn't allowed inside the \keyword{try} block +of a \keyword{try}...\keyword{finally} statement; read \pep{255} for a full +explanation of the interaction between \keyword{yield} and +exceptions.) + +Here's a sample usage of the \function{generate_ints} generator: + +\begin{verbatim} +>>> gen = generate_ints(3) +>>> gen +<generator object at 0x8117f90> +>>> gen.next() +0 +>>> gen.next() +1 +>>> gen.next() +2 +>>> gen.next() +Traceback (most recent call last): + File "<stdin>", line 1, in ? + File "<stdin>", line 2, in generate_ints +StopIteration +\end{verbatim} + +You could equally write \code{for i in generate_ints(5)}, or +\code{a,b,c = generate_ints(3)}. + +Inside a generator function, the \keyword{return} statement can only +be used without a value, and signals the end of the procession of +values; afterwards the generator cannot return any further values. +\keyword{return} with a value, such as \code{return 5}, is a syntax +error inside a generator function. The end of the generator's results +can also be indicated by raising \exception{StopIteration} manually, +or by just letting the flow of execution fall off the bottom of the +function. + +You could achieve the effect of generators manually by writing your +own class and storing all the local variables of the generator as +instance variables. For example, returning a list of integers could +be done by setting \code{self.count} to 0, and having the +\method{next()} method increment \code{self.count} and return it. +However, for a moderately complicated generator, writing a +corresponding class would be much messier. +\file{Lib/test/test_generators.py} contains a number of more +interesting examples. The simplest one implements an in-order +traversal of a tree using generators recursively. + +\begin{verbatim} +# A recursive generator that generates Tree leaves in in-order. +def inorder(t): + if t: + for x in inorder(t.left): + yield x + yield t.label + for x in inorder(t.right): + yield x +\end{verbatim} + +Two other examples in \file{Lib/test/test_generators.py} produce +solutions for the N-Queens problem (placing $N$ queens on an $NxN$ +chess board so that no queen threatens another) and the Knight's Tour +(a route that takes a knight to every square of an $NxN$ chessboard +without visiting any square twice). + +The idea of generators comes from other programming languages, +especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the +idea of generators is central. In Icon, every +expression and function call behaves like a generator. One example +from ``An Overview of the Icon Programming Language'' at +\url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of +what this looks like: + +\begin{verbatim} +sentence := "Store it in the neighboring harbor" +if (i := find("or", sentence)) > 5 then write(i) +\end{verbatim} + +In Icon the \function{find()} function returns the indexes at which the +substring ``or'' is found: 3, 23, 33. In the \keyword{if} statement, +\code{i} is first assigned a value of 3, but 3 is less than 5, so the +comparison fails, and Icon retries it with the second value of 23. 23 +is greater than 5, so the comparison now succeeds, and the code prints +the value 23 to the screen. + +Python doesn't go nearly as far as Icon in adopting generators as a +central concept. Generators are considered a new part of the core +Python language, but learning or using them isn't compulsory; if they +don't solve any problems that you have, feel free to ignore them. +One novel feature of Python's interface as compared to +Icon's is that a generator's state is represented as a concrete object +(the iterator) that can be passed around to other functions or stored +in a data structure. + +\begin{seealso} + +\seepep{255}{Simple Generators}{Written by Neil Schemenauer, Tim +Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer +and Tim Peters, with other fixes from the Python Labs crew.} + +\end{seealso} + + +%====================================================================== +\section{PEP 237: Unifying Long Integers and Integers} + +In recent versions, the distinction between regular integers, which +are 32-bit values on most machines, and long integers, which can be of +arbitrary size, was becoming an annoyance. For example, on platforms +that support files larger than \code{2**32} bytes, the +\method{tell()} method of file objects has to return a long integer. +However, there were various bits of Python that expected plain +integers and would raise an error if a long integer was provided +instead. For example, in Python 1.5, only regular integers +could be used as a slice index, and \code{'abc'[1L:]} would raise a +\exception{TypeError} exception with the message 'slice index must be +int'. + +Python 2.2 will shift values from short to long integers as required. +The 'L' suffix is no longer needed to indicate a long integer literal, +as now the compiler will choose the appropriate type. (Using the 'L' +suffix will be discouraged in future 2.x versions of Python, +triggering a warning in Python 2.4, and probably dropped in Python +3.0.) Many operations that used to raise an \exception{OverflowError} +will now return a long integer as their result. For example: + +\begin{verbatim} +>>> 1234567890123 +1234567890123L +>>> 2 ** 64 +18446744073709551616L +\end{verbatim} + +In most cases, integers and long integers will now be treated +identically. You can still distinguish them with the +\function{type()} built-in function, but that's rarely needed. + +\begin{seealso} + +\seepep{237}{Unifying Long Integers and Integers}{Written by +Moshe Zadka and Guido van Rossum. Implemented mostly by Guido van +Rossum.} + +\end{seealso} + + +%====================================================================== +\section{PEP 238: Changing the Division Operator} + +The most controversial change in Python 2.2 heralds the start of an effort +to fix an old design flaw that's been in Python from the beginning. +Currently Python's division operator, \code{/}, behaves like C's +division operator when presented with two integer arguments: it +returns an integer result that's truncated down when there would be +a fractional part. For example, \code{3/2} is 1, not 1.5, and +\code{(-1)/2} is -1, not -0.5. This means that the results of divison +can vary unexpectedly depending on the type of the two operands and +because Python is dynamically typed, it can be difficult to determine +the possible types of the operands. + +(The controversy is over whether this is \emph{really} a design flaw, +and whether it's worth breaking existing code to fix this. It's +caused endless discussions on python-dev, and in July 2001 erupted into an +storm of acidly sarcastic postings on \newsgroup{comp.lang.python}. I +won't argue for either side here and will stick to describing what's +implemented in 2.2. Read \pep{238} for a summary of arguments and +counter-arguments.) + +Because this change might break code, it's being introduced very +gradually. Python 2.2 begins the transition, but the switch won't be +complete until Python 3.0. + +First, I'll borrow some terminology from \pep{238}. ``True division'' is the +division that most non-programmers are familiar with: 3/2 is 1.5, 1/4 +is 0.25, and so forth. ``Floor division'' is what Python's \code{/} +operator currently does when given integer operands; the result is the +floor of the value returned by true division. ``Classic division'' is +the current mixed behaviour of \code{/}; it returns the result of +floor division when the operands are integers, and returns the result +of true division when one of the operands is a floating-point number. + +Here are the changes 2.2 introduces: + +\begin{itemize} + +\item A new operator, \code{//}, is the floor division operator. +(Yes, we know it looks like \Cpp's comment symbol.) \code{//} +\emph{always} performs floor division no matter what the types of +its operands are, so \code{1 // 2} is 0 and \code{1.0 // 2.0} is also +0.0. + +\code{//} is always available in Python 2.2; you don't need to enable +it using a \code{__future__} statement. + +\item By including a \code{from __future__ import division} in a +module, the \code{/} operator will be changed to return the result of +true division, so \code{1/2} is 0.5. Without the \code{__future__} +statement, \code{/} still means classic division. The default meaning +of \code{/} will not change until Python 3.0. + +\item Classes can define methods called \method{__truediv__} and +\method{__floordiv__} to overload the two division operators. At the +C level, there are also slots in the \ctype{PyNumberMethods} structure +so extension types can define the two operators. + +\item Python 2.2 supports some command-line arguments for testing +whether code will works with the changed division semantics. Running +python with \programopt{-Q warn} will cause a warning to be issued +whenever division is applied to two integers. You can use this to +find code that's affected by the change and fix it. By default, +Python 2.2 will simply perform classic division without a warning; the +warning will be turned on by default in Python 2.3. + +\end{itemize} + +\begin{seealso} + +\seepep{238}{Changing the Division Operator}{Written by Moshe Zadka and +Guido van Rossum. Implemented by Guido van Rossum..} + +\end{seealso} + + +%====================================================================== +\section{Unicode Changes} + +Python's Unicode support has been enhanced a bit in 2.2. Unicode +strings are usually stored as UCS-2, as 16-bit unsigned integers. +Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned +integers, as its internal encoding by supplying +\longprogramopt{enable-unicode=ucs4} to the configure script. +(It's also possible to specify +\longprogramopt{disable-unicode} to completely disable Unicode +support.) + +When built to use UCS-4 (a ``wide Python''), the interpreter can +natively handle Unicode characters from U+000000 to U+110000, so the +range of legal values for the \function{unichr()} function is expanded +accordingly. Using an interpreter compiled to use UCS-2 (a ``narrow +Python''), values greater than 65535 will still cause +\function{unichr()} to raise a \exception{ValueError} exception. +This is all described in \pep{261}, ``Support for `wide' Unicode +characters''; consult it for further details. + +Another change is simpler to explain. Since their introduction, +Unicode strings have supported an \method{encode()} method to convert +the string to a selected encoding such as UTF-8 or Latin-1. A +symmetric \method{decode(\optional{\var{encoding}})} method has been +added to 8-bit strings (though not to Unicode strings) in 2.2. +\method{decode()} assumes that the string is in the specified encoding +and decodes it, returning whatever is returned by the codec. + +Using this new feature, codecs have been added for tasks not directly +related to Unicode. For example, codecs have been added for +uu-encoding, MIME's base64 encoding, and compression with the +\module{zlib} module: + +\begin{verbatim} +>>> s = """Here is a lengthy piece of redundant, overly verbose, +... and repetitive text. +... """ +>>> data = s.encode('zlib') +>>> data +'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...' +>>> data.decode('zlib') +'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n' +>>> print s.encode('uu') +begin 666 <data> +M2&5R92!I<R!A(&QE;F=T:'D@<&EE8V4@;V8@<F5D=6YD86YT+"!O=F5R;'D@ +>=F5R8F]S92P*86YD(')E<&5T:71I=F4@=&5X="X* + +end +>>> "sheesh".encode('rot-13') +'furrfu' +\end{verbatim} + +To convert a class instance to Unicode, a \method{__unicode__} method +can be defined by a class, analogous to \method{__str__}. + +\method{encode()}, \method{decode()}, and \method{__unicode__} were +implemented by Marc-Andr\'e Lemburg. The changes to support using +UCS-4 internally were implemented by Fredrik Lundh and Martin von +L\"owis. + +\begin{seealso} + +\seepep{261}{Support for `wide' Unicode characters}{Written by +Paul Prescod.} + +\end{seealso} + + +%====================================================================== +\section{PEP 227: Nested Scopes} + +In Python 2.1, statically nested scopes were added as an optional +feature, to be enabled by a \code{from __future__ import +nested_scopes} directive. In 2.2 nested scopes no longer need to be +specially enabled, and are now always present. The rest of this section +is a copy of the description of nested scopes from my ``What's New in +Python 2.1'' document; if you read it when 2.1 came out, you can skip +the rest of this section. + +The largest change introduced in Python 2.1, and made complete in 2.2, +is to Python's scoping rules. In Python 2.0, at any given time there +are at most three namespaces used to look up variable names: local, +module-level, and the built-in namespace. This often surprised people +because it didn't match their intuitive expectations. For example, a +nested recursive function definition doesn't work: + +\begin{verbatim} +def f(): + ... + def g(value): + ... + return g(value-1) + 1 + ... +\end{verbatim} + +The function \function{g()} will always raise a \exception{NameError} +exception, because the binding of the name \samp{g} isn't in either +its local namespace or in the module-level namespace. This isn't much +of a problem in practice (how often do you recursively define interior +functions like this?), but this also made using the \keyword{lambda} +statement clumsier, and this was a problem in practice. In code which +uses \keyword{lambda} you can often find local variables being copied +by passing them as the default values of arguments. + +\begin{verbatim} +def find(self, name): + "Return list of any entries equal to 'name'" + L = filter(lambda x, name=name: x == name, + self.list_attribute) + return L +\end{verbatim} + +The readability of Python code written in a strongly functional style +suffers greatly as a result. + +The most significant change to Python 2.2 is that static scoping has +been added to the language to fix this problem. As a first effect, +the \code{name=name} default argument is now unnecessary in the above +example. Put simply, when a given variable name is not assigned a +value within a function (by an assignment, or the \keyword{def}, +\keyword{class}, or \keyword{import} statements), references to the +variable will be looked up in the local namespace of the enclosing +scope. A more detailed explanation of the rules, and a dissection of +the implementation, can be found in the PEP. + +This change may cause some compatibility problems for code where the +same variable name is used both at the module level and as a local +variable within a function that contains further function definitions. +This seems rather unlikely though, since such code would have been +pretty confusing to read in the first place. + +One side effect of the change is that the \code{from \var{module} +import *} and \keyword{exec} statements have been made illegal inside +a function scope under certain conditions. The Python reference +manual has said all along that \code{from \var{module} import *} is +only legal at the top level of a module, but the CPython interpreter +has never enforced this before. As part of the implementation of +nested scopes, the compiler which turns Python source into bytecodes +has to generate different code to access variables in a containing +scope. \code{from \var{module} import *} and \keyword{exec} make it +impossible for the compiler to figure this out, because they add names +to the local namespace that are unknowable at compile time. +Therefore, if a function contains function definitions or +\keyword{lambda} expressions with free variables, the compiler will +flag this by raising a \exception{SyntaxError} exception. + +To make the preceding explanation a bit clearer, here's an example: + +\begin{verbatim} +x = 1 +def f(): + # The next line is a syntax error + exec 'x=2' + def g(): + return x +\end{verbatim} + +Line 4 containing the \keyword{exec} statement is a syntax error, +since \keyword{exec} would define a new local variable named \samp{x} +whose value should be accessed by \function{g()}. + +This shouldn't be much of a limitation, since \keyword{exec} is rarely +used in most Python code (and when it is used, it's often a sign of a +poor design anyway). + +\begin{seealso} + +\seepep{227}{Statically Nested Scopes}{Written and implemented by +Jeremy Hylton.} + +\end{seealso} + + +%====================================================================== +\section{New and Improved Modules} + +\begin{itemize} + + \item The \module{xmlrpclib} module was contributed to the standard + library by Fredrik Lundh, providing support for writing XML-RPC + clients. XML-RPC is a simple remote procedure call protocol built on + top of HTTP and XML. For example, the following snippet retrieves a + list of RSS channels from the O'Reilly Network, and then + lists the recent headlines for one channel: + +\begin{verbatim} +import xmlrpclib +s = xmlrpclib.Server( + 'http://www.oreillynet.com/meerkat/xml-rpc/server.php') +channels = s.meerkat.getChannels() +# channels is a list of dictionaries, like this: +# [{'id': 4, 'title': 'Freshmeat Daily News'} +# {'id': 190, 'title': '32Bits Online'}, +# {'id': 4549, 'title': '3DGamers'}, ... ] + +# Get the items for one channel +items = s.meerkat.getItems( {'channel': 4} ) + +# 'items' is another list of dictionaries, like this: +# [{'link': 'http://freshmeat.net/releases/52719/', +# 'description': 'A utility which converts HTML to XSL FO.', +# 'title': 'html2fo 0.3 (Default)'}, ... ] +\end{verbatim} + +The \module{SimpleXMLRPCServer} module makes it easy to create +straightforward XML-RPC servers. See \url{http://www.xmlrpc.com/} for +more information about XML-RPC. + + \item The new \module{hmac} module implements the HMAC + algorithm described by \rfc{2104}. + (Contributed by Gerhard H\"aring.) + + \item Several functions that originally returned lengthy tuples now + return pseudo-sequences that still behave like tuples but also have + mnemonic attributes such as member{st_mtime} or \member{tm_year}. + The enhanced functions include \function{stat()}, + \function{fstat()}, \function{statvfs()}, and \function{fstatvfs()} + in the \module{os} module, and \function{localtime()}, + \function{gmtime()}, and \function{strptime()} in the \module{time} + module. + + For example, to obtain a file's size using the old tuples, you'd end + up writing something like \code{file_size = + os.stat(filename)[stat.ST_SIZE]}, but now this can be written more + clearly as \code{file_size = os.stat(filename).st_size}. + + The original patch for this feature was contributed by Nick Mathewson. + + \item The Python profiler has been extensively reworked and various + errors in its output have been corrected. (Contributed by + Fred~L. Drake, Jr. and Tim Peters.) + + \item The \module{socket} module can be compiled to support IPv6; + specify the \longprogramopt{enable-ipv6} option to Python's configure + script. (Contributed by Jun-ichiro ``itojun'' Hagino.) + + \item Two new format characters were added to the \module{struct} + module for 64-bit integers on platforms that support the C + \ctype{long long} type. \samp{q} is for a signed 64-bit integer, + and \samp{Q} is for an unsigned one. The value is returned in + Python's long integer type. (Contributed by Tim Peters.) + + \item In the interpreter's interactive mode, there's a new built-in + function \function{help()} that uses the \module{pydoc} module + introduced in Python 2.1 to provide interactive help. + \code{help(\var{object})} displays any available help text about + \var{object}. \function{help()} with no argument puts you in an online + help utility, where you can enter the names of functions, classes, + or modules to read their help text. + (Contributed by Guido van Rossum, using Ka-Ping Yee's \module{pydoc} module.) + + \item Various bugfixes and performance improvements have been made + to the SRE engine underlying the \module{re} module. For example, + the \function{re.sub()} and \function{re.split()} functions have + been rewritten in C. Another contributed patch speeds up certain + Unicode character ranges by a factor of two, and a new \method{finditer()} + method that returns an iterator over all the non-overlapping matches in + a given string. + (SRE is maintained by + Fredrik Lundh. The BIGCHARSET patch was contributed by Martin von + L\"owis.) + + \item The \module{smtplib} module now supports \rfc{2487}, ``Secure + SMTP over TLS'', so it's now possible to encrypt the SMTP traffic + between a Python program and the mail transport agent being handed a + message. \module{smtplib} also supports SMTP authentication. + (Contributed by Gerhard H\"aring.) + + \item The \module{imaplib} module, maintained by Piers Lauder, has + support for several new extensions: the NAMESPACE extension defined + in \rfc{2342}, SORT, GETACL and SETACL. (Contributed by Anthony + Baxter and Michel Pelletier.) + + \item The \module{rfc822} module's parsing of email addresses is now + compliant with \rfc{2822}, an update to \rfc{822}. (The module's + name is \emph{not} going to be changed to \samp{rfc2822}.) A new + package, \module{email}, has also been added for parsing and + generating e-mail messages. (Contributed by Barry Warsaw, and + arising out of his work on Mailman.) + + \item The \module{difflib} module now contains a new \class{Differ} + class for producing human-readable lists of changes (a ``delta'') + between two sequences of lines of text. There are also two + generator functions, \function{ndiff()} and \function{restore()}, + which respectively return a delta from two sequences, or one of the + original sequences from a delta. (Grunt work contributed by David + Goodger, from ndiff.py code by Tim Peters who then did the + generatorization.) + + \item New constants \constant{ascii_letters}, + \constant{ascii_lowercase}, and \constant{ascii_uppercase} were + added to the \module{string} module. There were several modules in + the standard library that used \constant{string.letters} to mean the + ranges A-Za-z, but that assumption is incorrect when locales are in + use, because \constant{string.letters} varies depending on the set + of legal characters defined by the current locale. The buggy + modules have all been fixed to use \constant{ascii_letters} instead. + (Reported by an unknown person; fixed by Fred~L. Drake, Jr.) + + \item The \module{mimetypes} module now makes it easier to use + alternative MIME-type databases by the addition of a + \class{MimeTypes} class, which takes a list of filenames to be + parsed. (Contributed by Fred~L. Drake, Jr.) + + \item A \class{Timer} class was added to the \module{threading} + module that allows scheduling an activity to happen at some future + time. (Contributed by Itamar Shtull-Trauring.) + +\end{itemize} + + +%====================================================================== +\section{Interpreter Changes and Fixes} + +Some of the changes only affect people who deal with the Python +interpreter at the C level because they're writing Python extension modules, +embedding the interpreter, or just hacking on the interpreter itself. +If you only write Python code, none of the changes described here will +affect you very much. + +\begin{itemize} + + \item Profiling and tracing functions can now be implemented in C, + which can operate at much higher speeds than Python-based functions + and should reduce the overhead of profiling and tracing. This + will be of interest to authors of development environments for + Python. Two new C functions were added to Python's API, + \cfunction{PyEval_SetProfile()} and \cfunction{PyEval_SetTrace()}. + The existing \function{sys.setprofile()} and + \function{sys.settrace()} functions still exist, and have simply + been changed to use the new C-level interface. (Contributed by Fred + L. Drake, Jr.) + + \item Another low-level API, primarily of interest to implementors + of Python debuggers and development tools, was added. + \cfunction{PyInterpreterState_Head()} and + \cfunction{PyInterpreterState_Next()} let a caller walk through all + the existing interpreter objects; + \cfunction{PyInterpreterState_ThreadHead()} and + \cfunction{PyThreadState_Next()} allow looping over all the thread + states for a given interpreter. (Contributed by David Beazley.) + +\item The C-level interface to the garbage collector has been changed +to make it easier to write extension types that support garbage +collection and to debug misuses of the functions. +Various functions have slightly different semantics, so a bunch of +functions had to be renamed. Extensions that use the old API will +still compile but will \emph{not} participate in garbage collection, +so updating them for 2.2 should be considered fairly high priority. + +To upgrade an extension module to the new API, perform the following +steps: + +\begin{itemize} + +\item Rename \cfunction{Py_TPFLAGS_GC} to \cfunction{PyTPFLAGS_HAVE_GC}. + +\item Use \cfunction{PyObject_GC_New} or \cfunction{PyObject_GC_NewVar} to +allocate objects, and \cfunction{PyObject_GC_Del} to deallocate them. + +\item Rename \cfunction{PyObject_GC_Init} to \cfunction{PyObject_GC_Track} and +\cfunction{PyObject_GC_Fini} to \cfunction{PyObject_GC_UnTrack}. + +\item Remove \cfunction{PyGC_HEAD_SIZE} from object size calculations. + +\item Remove calls to \cfunction{PyObject_AS_GC} and \cfunction{PyObject_FROM_GC}. + +\end{itemize} + + \item A new \samp{et} format sequence was added to + \cfunction{PyArg_ParseTuple}; \samp{et} takes both a parameter and + an encoding name, and converts the parameter to the given encoding + if the parameter turns out to be a Unicode string, or leaves it + alone if it's an 8-bit string, assuming it to already be in the + desired encoding. This differs from the \samp{es} format character, + which assumes that 8-bit strings are in Python's default ASCII + encoding and converts them to the specified new encoding. + (Contributed by M.-A. Lemburg, and used for the MBCS support on + Windows described in the following section.) + + \item A different argument parsing function, + \cfunction{PyArg_UnpackTuple()}, has been added that's simpler and + presumably faster. Instead of specifying a format string, the + caller simply gives the minimum and maximum number of arguments + expected, and a set of pointers to \ctype{PyObject*} variables that + will be filled in with argument values. + + \item Two new flags \constant{METH_NOARGS} and \constant{METH_O} are + available in method definition tables to simplify implementation of + methods with no arguments or a single untyped argument. Calling + such methods is more efficient than calling a corresponding method + that uses \constant{METH_VARARGS}. + Also, the old \constant{METH_OLDARGS} style of writing C methods is + now officially deprecated. + +\item + Two new wrapper functions, \cfunction{PyOS_snprintf()} and + \cfunction{PyOS_vsnprintf()} were added to provide + cross-platform implementations for the relatively new + \cfunction{snprintf()} and \cfunction{vsnprintf()} C lib APIs. In + contrast to the standard \cfunction{sprintf()} and + \cfunction{vsprintf()} functions, the Python versions check the + bounds of the buffer used to protect against buffer overruns. + (Contributed by M.-A. Lemburg.) + + \item The \cfunction{_PyTuple_Resize()} function has lost an unused + parameter, so now it takes 2 parameters instead of 3. The third + argument was never used, and can simply be discarded when porting + code from earlier versions to Python 2.2. + +\end{itemize} + + +%====================================================================== +\section{Other Changes and Fixes} + +As usual there were a bunch of other improvements and bugfixes +scattered throughout the source tree. A search through the CVS change +logs finds there were 527 patches applied and 683 bugs fixed between +Python 2.1 and 2.2; 2.2.1 applied 139 patches and fixed 143 bugs; +2.2.2 applied 106 patches and fixed 82 bugs. These figures are likely +to be underestimates. + +Some of the more notable changes are: + +\begin{itemize} + + \item The code for the MacOS port for Python, maintained by Jack + Jansen, is now kept in the main Python CVS tree, and many changes + have been made to support MacOS~X. + +The most significant change is the ability to build Python as a +framework, enabled by supplying the \longprogramopt{enable-framework} +option to the configure script when compiling Python. According to +Jack Jansen, ``This installs a self-contained Python installation plus +the OS~X framework "glue" into +\file{/Library/Frameworks/Python.framework} (or another location of +choice). For now there is little immediate added benefit to this +(actually, there is the disadvantage that you have to change your PATH +to be able to find Python), but it is the basis for creating a +full-blown Python application, porting the MacPython IDE, possibly +using Python as a standard OSA scripting language and much more.'' + +Most of the MacPython toolbox modules, which interface to MacOS APIs +such as windowing, QuickTime, scripting, etc. have been ported to OS~X, +but they've been left commented out in \file{setup.py}. People who want +to experiment with these modules can uncomment them manually. + +% Jack's original comments: +%The main change is the possibility to build Python as a +%framework. This installs a self-contained Python installation plus the +%OSX framework "glue" into /Library/Frameworks/Python.framework (or +%another location of choice). For now there is little immedeate added +%benefit to this (actually, there is the disadvantage that you have to +%change your PATH to be able to find Python), but it is the basis for +%creating a fullblown Python application, porting the MacPython IDE, +%possibly using Python as a standard OSA scripting language and much +%more. You enable this with "configure --enable-framework". + +%The other change is that most MacPython toolbox modules, which +%interface to all the MacOS APIs such as windowing, quicktime, +%scripting, etc. have been ported. Again, most of these are not of +%immedeate use, as they need a full application to be really useful, so +%they have been commented out in setup.py. People wanting to experiment +%can uncomment them. Gestalt and Internet Config modules are enabled by +%default. + + \item Keyword arguments passed to builtin functions that don't take them + now cause a \exception{TypeError} exception to be raised, with the + message "\var{function} takes no keyword arguments". + + \item Weak references, added in Python 2.1 as an extension module, + are now part of the core because they're used in the implementation + of new-style classes. The \exception{ReferenceError} exception has + therefore moved from the \module{weakref} module to become a + built-in exception. + + \item A new script, \file{Tools/scripts/cleanfuture.py} by Tim + Peters, automatically removes obsolete \code{__future__} statements + from Python source code. + + \item An additional \var{flags} argument has been added to the + built-in function \function{compile()}, so the behaviour of + \code{__future__} statements can now be correctly observed in + simulated shells, such as those presented by IDLE and other + development environments. This is described in \pep{264}. + (Contributed by Michael Hudson.) + + \item The new license introduced with Python 1.6 wasn't + GPL-compatible. This is fixed by some minor textual changes to the + 2.2 license, so it's now legal to embed Python inside a GPLed + program again. Note that Python itself is not GPLed, but instead is + under a license that's essentially equivalent to the BSD license, + same as it always was. The license changes were also applied to the + Python 2.0.1 and 2.1.1 releases. + + \item When presented with a Unicode filename on Windows, Python will + now convert it to an MBCS encoded string, as used by the Microsoft + file APIs. As MBCS is explicitly used by the file APIs, Python's + choice of ASCII as the default encoding turns out to be an + annoyance. On \UNIX, the locale's character set is used if + \function{locale.nl_langinfo(CODESET)} is available. (Windows + support was contributed by Mark Hammond with assistance from + Marc-Andr\'e Lemburg. \UNIX{} support was added by Martin von L\"owis.) + + \item Large file support is now enabled on Windows. (Contributed by + Tim Peters.) + + \item The \file{Tools/scripts/ftpmirror.py} script + now parses a \file{.netrc} file, if you have one. + (Contributed by Mike Romberg.) + + \item Some features of the object returned by the + \function{xrange()} function are now deprecated, and trigger + warnings when they're accessed; they'll disappear in Python 2.3. + \class{xrange} objects tried to pretend they were full sequence + types by supporting slicing, sequence multiplication, and the + \keyword{in} operator, but these features were rarely used and + therefore buggy. The \method{tolist()} method and the + \member{start}, \member{stop}, and \member{step} attributes are also + being deprecated. At the C level, the fourth argument to the + \cfunction{PyRange_New()} function, \samp{repeat}, has also been + deprecated. + + \item There were a bunch of patches to the dictionary + implementation, mostly to fix potential core dumps if a dictionary + contains objects that sneakily changed their hash value, or mutated + the dictionary they were contained in. For a while python-dev fell + into a gentle rhythm of Michael Hudson finding a case that dumped + core, Tim Peters fixing the bug, Michael finding another case, and round + and round it went. + + \item On Windows, Python can now be compiled with Borland C thanks + to a number of patches contributed by Stephen Hansen, though the + result isn't fully functional yet. (But this \emph{is} progress...) + + \item Another Windows enhancement: Wise Solutions generously offered + PythonLabs use of their InstallerMaster 8.1 system. Earlier + PythonLabs Windows installers used Wise 5.0a, which was beginning to + show its age. (Packaged up by Tim Peters.) + + \item Files ending in \samp{.pyw} can now be imported on Windows. + \samp{.pyw} is a Windows-only thing, used to indicate that a script + needs to be run using PYTHONW.EXE instead of PYTHON.EXE in order to + prevent a DOS console from popping up to display the output. This + patch makes it possible to import such scripts, in case they're also + usable as modules. (Implemented by David Bolen.) + + \item On platforms where Python uses the C \cfunction{dlopen()} function + to load extension modules, it's now possible to set the flags used + by \cfunction{dlopen()} using the \function{sys.getdlopenflags()} and + \function{sys.setdlopenflags()} functions. (Contributed by Bram Stolk.) + + \item The \function{pow()} built-in function no longer supports 3 + arguments when floating-point numbers are supplied. + \code{pow(\var{x}, \var{y}, \var{z})} returns \code{(x**y) \% z}, but + this is never useful for floating point numbers, and the final + result varies unpredictably depending on the platform. A call such + as \code{pow(2.0, 8.0, 7.0)} will now raise a \exception{TypeError} + exception. + +\end{itemize} + + +%====================================================================== +\section{Acknowledgements} + +The author would like to thank the following people for offering +suggestions, corrections and assistance with various drafts of this +article: Fred Bremmer, Keith Briggs, Andrew Dalke, Fred~L. Drake, Jr., +Carel Fellinger, David Goodger, Mark Hammond, Stephen Hansen, Michael +Hudson, Jack Jansen, Marc-Andr\'e Lemburg, Martin von L\"owis, Fredrik +Lundh, Michael McLay, Nick Mathewson, Paul Moore, Gustavo Niemeyer, +Don O'Donnell, Joonas Paalasma, Tim Peters, Jens Quade, Tom Reinhardt, Neil +Schemenauer, Guido van Rossum, Greg Ward, Edward Welbourne. + +\end{document} diff --git a/sys/src/cmd/python/Doc/whatsnew/whatsnew23.tex b/sys/src/cmd/python/Doc/whatsnew/whatsnew23.tex new file mode 100644 index 000000000..8b0fb4ae8 --- /dev/null +++ b/sys/src/cmd/python/Doc/whatsnew/whatsnew23.tex @@ -0,0 +1,2380 @@ +\documentclass{howto} +\usepackage{distutils} +% $Id: whatsnew23.tex 50964 2006-07-30 03:03:43Z fred.drake $ + +\title{What's New in Python 2.3} +\release{1.01} +\author{A.M.\ Kuchling} +\authoraddress{ + \strong{Python Software Foundation}\\ + Email: \email{amk@amk.ca} +} + +\begin{document} +\maketitle +\tableofcontents + +This article explains the new features in Python 2.3. Python 2.3 was +released on July 29, 2003. + +The main themes for Python 2.3 are polishing some of the features +added in 2.2, adding various small but useful enhancements to the core +language, and expanding the standard library. The new object model +introduced in the previous version has benefited from 18 months of +bugfixes and from optimization efforts that have improved the +performance of new-style classes. A few new built-in functions have +been added such as \function{sum()} and \function{enumerate()}. The +\keyword{in} operator can now be used for substring searches (e.g. +\code{"ab" in "abc"} returns \constant{True}). + +Some of the many new library features include Boolean, set, heap, and +date/time data types, the ability to import modules from ZIP-format +archives, metadata support for the long-awaited Python catalog, an +updated version of IDLE, and modules for logging messages, wrapping +text, parsing CSV files, processing command-line options, using BerkeleyDB +databases... the list of new and enhanced modules is lengthy. + +This article doesn't attempt to provide a complete specification of +the new features, but instead provides a convenient overview. For +full details, you should refer to the documentation for Python 2.3, +such as the \citetitle[../lib/lib.html]{Python Library Reference} and +the \citetitle[../ref/ref.html]{Python Reference Manual}. If you want +to understand the complete implementation and design rationale, +refer to the PEP for a particular new feature. + + +%====================================================================== +\section{PEP 218: A Standard Set Datatype} + +The new \module{sets} module contains an implementation of a set +datatype. The \class{Set} class is for mutable sets, sets that can +have members added and removed. The \class{ImmutableSet} class is for +sets that can't be modified, and instances of \class{ImmutableSet} can +therefore be used as dictionary keys. Sets are built on top of +dictionaries, so the elements within a set must be hashable. + +Here's a simple example: + +\begin{verbatim} +>>> import sets +>>> S = sets.Set([1,2,3]) +>>> S +Set([1, 2, 3]) +>>> 1 in S +True +>>> 0 in S +False +>>> S.add(5) +>>> S.remove(3) +>>> S +Set([1, 2, 5]) +>>> +\end{verbatim} + +The union and intersection of sets can be computed with the +\method{union()} and \method{intersection()} methods; an alternative +notation uses the bitwise operators \code{\&} and \code{|}. +Mutable sets also have in-place versions of these methods, +\method{union_update()} and \method{intersection_update()}. + +\begin{verbatim} +>>> S1 = sets.Set([1,2,3]) +>>> S2 = sets.Set([4,5,6]) +>>> S1.union(S2) +Set([1, 2, 3, 4, 5, 6]) +>>> S1 | S2 # Alternative notation +Set([1, 2, 3, 4, 5, 6]) +>>> S1.intersection(S2) +Set([]) +>>> S1 & S2 # Alternative notation +Set([]) +>>> S1.union_update(S2) +>>> S1 +Set([1, 2, 3, 4, 5, 6]) +>>> +\end{verbatim} + +It's also possible to take the symmetric difference of two sets. This +is the set of all elements in the union that aren't in the +intersection. Another way of putting it is that the symmetric +difference contains all elements that are in exactly one +set. Again, there's an alternative notation (\code{\^}), and an +in-place version with the ungainly name +\method{symmetric_difference_update()}. + +\begin{verbatim} +>>> S1 = sets.Set([1,2,3,4]) +>>> S2 = sets.Set([3,4,5,6]) +>>> S1.symmetric_difference(S2) +Set([1, 2, 5, 6]) +>>> S1 ^ S2 +Set([1, 2, 5, 6]) +>>> +\end{verbatim} + +There are also \method{issubset()} and \method{issuperset()} methods +for checking whether one set is a subset or superset of another: + +\begin{verbatim} +>>> S1 = sets.Set([1,2,3]) +>>> S2 = sets.Set([2,3]) +>>> S2.issubset(S1) +True +>>> S1.issubset(S2) +False +>>> S1.issuperset(S2) +True +>>> +\end{verbatim} + + +\begin{seealso} + +\seepep{218}{Adding a Built-In Set Object Type}{PEP written by Greg V. Wilson. +Implemented by Greg V. Wilson, Alex Martelli, and GvR.} + +\end{seealso} + + + +%====================================================================== +\section{PEP 255: Simple Generators\label{section-generators}} + +In Python 2.2, generators were added as an optional feature, to be +enabled by a \code{from __future__ import generators} directive. In +2.3 generators no longer need to be specially enabled, and are now +always present; this means that \keyword{yield} is now always a +keyword. The rest of this section is a copy of the description of +generators from the ``What's New in Python 2.2'' document; if you read +it back when Python 2.2 came out, you can skip the rest of this section. + +You're doubtless familiar with how function calls work in Python or C. +When you call a function, it gets a private namespace where its local +variables are created. When the function reaches a \keyword{return} +statement, the local variables are destroyed and the resulting value +is returned to the caller. A later call to the same function will get +a fresh new set of local variables. But, what if the local variables +weren't thrown away on exiting a function? What if you could later +resume the function where it left off? This is what generators +provide; they can be thought of as resumable functions. + +Here's the simplest example of a generator function: + +\begin{verbatim} +def generate_ints(N): + for i in range(N): + yield i +\end{verbatim} + +A new keyword, \keyword{yield}, was introduced for generators. Any +function containing a \keyword{yield} statement is a generator +function; this is detected by Python's bytecode compiler which +compiles the function specially as a result. + +When you call a generator function, it doesn't return a single value; +instead it returns a generator object that supports the iterator +protocol. On executing the \keyword{yield} statement, the generator +outputs the value of \code{i}, similar to a \keyword{return} +statement. The big difference between \keyword{yield} and a +\keyword{return} statement is that on reaching a \keyword{yield} the +generator's state of execution is suspended and local variables are +preserved. On the next call to the generator's \code{.next()} method, +the function will resume executing immediately after the +\keyword{yield} statement. (For complicated reasons, the +\keyword{yield} statement isn't allowed inside the \keyword{try} block +of a \keyword{try}...\keyword{finally} statement; read \pep{255} for a full +explanation of the interaction between \keyword{yield} and +exceptions.) + +Here's a sample usage of the \function{generate_ints()} generator: + +\begin{verbatim} +>>> gen = generate_ints(3) +>>> gen +<generator object at 0x8117f90> +>>> gen.next() +0 +>>> gen.next() +1 +>>> gen.next() +2 +>>> gen.next() +Traceback (most recent call last): + File "stdin", line 1, in ? + File "stdin", line 2, in generate_ints +StopIteration +\end{verbatim} + +You could equally write \code{for i in generate_ints(5)}, or +\code{a,b,c = generate_ints(3)}. + +Inside a generator function, the \keyword{return} statement can only +be used without a value, and signals the end of the procession of +values; afterwards the generator cannot return any further values. +\keyword{return} with a value, such as \code{return 5}, is a syntax +error inside a generator function. The end of the generator's results +can also be indicated by raising \exception{StopIteration} manually, +or by just letting the flow of execution fall off the bottom of the +function. + +You could achieve the effect of generators manually by writing your +own class and storing all the local variables of the generator as +instance variables. For example, returning a list of integers could +be done by setting \code{self.count} to 0, and having the +\method{next()} method increment \code{self.count} and return it. +However, for a moderately complicated generator, writing a +corresponding class would be much messier. +\file{Lib/test/test_generators.py} contains a number of more +interesting examples. The simplest one implements an in-order +traversal of a tree using generators recursively. + +\begin{verbatim} +# A recursive generator that generates Tree leaves in in-order. +def inorder(t): + if t: + for x in inorder(t.left): + yield x + yield t.label + for x in inorder(t.right): + yield x +\end{verbatim} + +Two other examples in \file{Lib/test/test_generators.py} produce +solutions for the N-Queens problem (placing $N$ queens on an $NxN$ +chess board so that no queen threatens another) and the Knight's Tour +(a route that takes a knight to every square of an $NxN$ chessboard +without visiting any square twice). + +The idea of generators comes from other programming languages, +especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the +idea of generators is central. In Icon, every +expression and function call behaves like a generator. One example +from ``An Overview of the Icon Programming Language'' at +\url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of +what this looks like: + +\begin{verbatim} +sentence := "Store it in the neighboring harbor" +if (i := find("or", sentence)) > 5 then write(i) +\end{verbatim} + +In Icon the \function{find()} function returns the indexes at which the +substring ``or'' is found: 3, 23, 33. In the \keyword{if} statement, +\code{i} is first assigned a value of 3, but 3 is less than 5, so the +comparison fails, and Icon retries it with the second value of 23. 23 +is greater than 5, so the comparison now succeeds, and the code prints +the value 23 to the screen. + +Python doesn't go nearly as far as Icon in adopting generators as a +central concept. Generators are considered part of the core +Python language, but learning or using them isn't compulsory; if they +don't solve any problems that you have, feel free to ignore them. +One novel feature of Python's interface as compared to +Icon's is that a generator's state is represented as a concrete object +(the iterator) that can be passed around to other functions or stored +in a data structure. + +\begin{seealso} + +\seepep{255}{Simple Generators}{Written by Neil Schemenauer, Tim +Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer +and Tim Peters, with other fixes from the Python Labs crew.} + +\end{seealso} + + +%====================================================================== +\section{PEP 263: Source Code Encodings \label{section-encodings}} + +Python source files can now be declared as being in different +character set encodings. Encodings are declared by including a +specially formatted comment in the first or second line of the source +file. For example, a UTF-8 file can be declared with: + +\begin{verbatim} +#!/usr/bin/env python +# -*- coding: UTF-8 -*- +\end{verbatim} + +Without such an encoding declaration, the default encoding used is +7-bit ASCII. Executing or importing modules that contain string +literals with 8-bit characters and have no encoding declaration will result +in a \exception{DeprecationWarning} being signalled by Python 2.3; in +2.4 this will be a syntax error. + +The encoding declaration only affects Unicode string literals, which +will be converted to Unicode using the specified encoding. Note that +Python identifiers are still restricted to ASCII characters, so you +can't have variable names that use characters outside of the usual +alphanumerics. + +\begin{seealso} + +\seepep{263}{Defining Python Source Code Encodings}{Written by +Marc-Andr\'e Lemburg and Martin von~L\"owis; implemented by Suzuki +Hisao and Martin von~L\"owis.} + +\end{seealso} + + +%====================================================================== +\section{PEP 273: Importing Modules from ZIP Archives} + +The new \module{zipimport} module adds support for importing +modules from a ZIP-format archive. You don't need to import the +module explicitly; it will be automatically imported if a ZIP +archive's filename is added to \code{sys.path}. For example: + +\begin{verbatim} +amk@nyman:~/src/python$ unzip -l /tmp/example.zip +Archive: /tmp/example.zip + Length Date Time Name + -------- ---- ---- ---- + 8467 11-26-02 22:30 jwzthreading.py + -------- ------- + 8467 1 file +amk@nyman:~/src/python$ ./python +Python 2.3 (#1, Aug 1 2003, 19:54:32) +>>> import sys +>>> sys.path.insert(0, '/tmp/example.zip') # Add .zip file to front of path +>>> import jwzthreading +>>> jwzthreading.__file__ +'/tmp/example.zip/jwzthreading.py' +>>> +\end{verbatim} + +An entry in \code{sys.path} can now be the filename of a ZIP archive. +The ZIP archive can contain any kind of files, but only files named +\file{*.py}, \file{*.pyc}, or \file{*.pyo} can be imported. If an +archive only contains \file{*.py} files, Python will not attempt to +modify the archive by adding the corresponding \file{*.pyc} file, meaning +that if a ZIP archive doesn't contain \file{*.pyc} files, importing may be +rather slow. + +A path within the archive can also be specified to only import from a +subdirectory; for example, the path \file{/tmp/example.zip/lib/} +would only import from the \file{lib/} subdirectory within the +archive. + +\begin{seealso} + +\seepep{273}{Import Modules from Zip Archives}{Written by James C. Ahlstrom, +who also provided an implementation. +Python 2.3 follows the specification in \pep{273}, +but uses an implementation written by Just van~Rossum +that uses the import hooks described in \pep{302}. +See section~\ref{section-pep302} for a description of the new import hooks. +} + +\end{seealso} + +%====================================================================== +\section{PEP 277: Unicode file name support for Windows NT} + +On Windows NT, 2000, and XP, the system stores file names as Unicode +strings. Traditionally, Python has represented file names as byte +strings, which is inadequate because it renders some file names +inaccessible. + +Python now allows using arbitrary Unicode strings (within the +limitations of the file system) for all functions that expect file +names, most notably the \function{open()} built-in function. If a Unicode +string is passed to \function{os.listdir()}, Python now returns a list +of Unicode strings. A new function, \function{os.getcwdu()}, returns +the current directory as a Unicode string. + +Byte strings still work as file names, and on Windows Python will +transparently convert them to Unicode using the \code{mbcs} encoding. + +Other systems also allow Unicode strings as file names but convert +them to byte strings before passing them to the system, which can +cause a \exception{UnicodeError} to be raised. Applications can test +whether arbitrary Unicode strings are supported as file names by +checking \member{os.path.supports_unicode_filenames}, a Boolean value. + +Under MacOS, \function{os.listdir()} may now return Unicode filenames. + +\begin{seealso} + +\seepep{277}{Unicode file name support for Windows NT}{Written by Neil +Hodgson; implemented by Neil Hodgson, Martin von~L\"owis, and Mark +Hammond.} + +\end{seealso} + + +%====================================================================== +\section{PEP 278: Universal Newline Support} + +The three major operating systems used today are Microsoft Windows, +Apple's Macintosh OS, and the various \UNIX\ derivatives. A minor +irritation of cross-platform work +is that these three platforms all use different characters +to mark the ends of lines in text files. \UNIX\ uses the linefeed +(ASCII character 10), MacOS uses the carriage return (ASCII +character 13), and Windows uses a two-character sequence of a +carriage return plus a newline. + +Python's file objects can now support end of line conventions other +than the one followed by the platform on which Python is running. +Opening a file with the mode \code{'U'} or \code{'rU'} will open a file +for reading in universal newline mode. All three line ending +conventions will be translated to a \character{\e n} in the strings +returned by the various file methods such as \method{read()} and +\method{readline()}. + +Universal newline support is also used when importing modules and when +executing a file with the \function{execfile()} function. This means +that Python modules can be shared between all three operating systems +without needing to convert the line-endings. + +This feature can be disabled when compiling Python by specifying +the \longprogramopt{without-universal-newlines} switch when running Python's +\program{configure} script. + +\begin{seealso} + +\seepep{278}{Universal Newline Support}{Written +and implemented by Jack Jansen.} + +\end{seealso} + + +%====================================================================== +\section{PEP 279: enumerate()\label{section-enumerate}} + +A new built-in function, \function{enumerate()}, will make +certain loops a bit clearer. \code{enumerate(thing)}, where +\var{thing} is either an iterator or a sequence, returns a iterator +that will return \code{(0, \var{thing}[0])}, \code{(1, +\var{thing}[1])}, \code{(2, \var{thing}[2])}, and so forth. + +A common idiom to change every element of a list looks like this: + +\begin{verbatim} +for i in range(len(L)): + item = L[i] + # ... compute some result based on item ... + L[i] = result +\end{verbatim} + +This can be rewritten using \function{enumerate()} as: + +\begin{verbatim} +for i, item in enumerate(L): + # ... compute some result based on item ... + L[i] = result +\end{verbatim} + + +\begin{seealso} + +\seepep{279}{The enumerate() built-in function}{Written +and implemented by Raymond D. Hettinger.} + +\end{seealso} + + +%====================================================================== +\section{PEP 282: The logging Package} + +A standard package for writing logs, \module{logging}, has been added +to Python 2.3. It provides a powerful and flexible mechanism for +generating logging output which can then be filtered and processed in +various ways. A configuration file written in a standard format can +be used to control the logging behavior of a program. Python +includes handlers that will write log records to +standard error or to a file or socket, send them to the system log, or +even e-mail them to a particular address; of course, it's also +possible to write your own handler classes. + +The \class{Logger} class is the primary class. +Most application code will deal with one or more \class{Logger} +objects, each one used by a particular subsystem of the application. +Each \class{Logger} is identified by a name, and names are organized +into a hierarchy using \samp{.} as the component separator. For +example, you might have \class{Logger} instances named \samp{server}, +\samp{server.auth} and \samp{server.network}. The latter two +instances are below \samp{server} in the hierarchy. This means that +if you turn up the verbosity for \samp{server} or direct \samp{server} +messages to a different handler, the changes will also apply to +records logged to \samp{server.auth} and \samp{server.network}. +There's also a root \class{Logger} that's the parent of all other +loggers. + +For simple uses, the \module{logging} package contains some +convenience functions that always use the root log: + +\begin{verbatim} +import logging + +logging.debug('Debugging information') +logging.info('Informational message') +logging.warning('Warning:config file %s not found', 'server.conf') +logging.error('Error occurred') +logging.critical('Critical error -- shutting down') +\end{verbatim} + +This produces the following output: + +\begin{verbatim} +WARNING:root:Warning:config file server.conf not found +ERROR:root:Error occurred +CRITICAL:root:Critical error -- shutting down +\end{verbatim} + +In the default configuration, informational and debugging messages are +suppressed and the output is sent to standard error. You can enable +the display of informational and debugging messages by calling the +\method{setLevel()} method on the root logger. + +Notice the \function{warning()} call's use of string formatting +operators; all of the functions for logging messages take the +arguments \code{(\var{msg}, \var{arg1}, \var{arg2}, ...)} and log the +string resulting from \code{\var{msg} \% (\var{arg1}, \var{arg2}, +...)}. + +There's also an \function{exception()} function that records the most +recent traceback. Any of the other functions will also record the +traceback if you specify a true value for the keyword argument +\var{exc_info}. + +\begin{verbatim} +def f(): + try: 1/0 + except: logging.exception('Problem recorded') + +f() +\end{verbatim} + +This produces the following output: + +\begin{verbatim} +ERROR:root:Problem recorded +Traceback (most recent call last): + File "t.py", line 6, in f + 1/0 +ZeroDivisionError: integer division or modulo by zero +\end{verbatim} + +Slightly more advanced programs will use a logger other than the root +logger. The \function{getLogger(\var{name})} function is used to get +a particular log, creating it if it doesn't exist yet. +\function{getLogger(None)} returns the root logger. + + +\begin{verbatim} +log = logging.getLogger('server') + ... +log.info('Listening on port %i', port) + ... +log.critical('Disk full') + ... +\end{verbatim} + +Log records are usually propagated up the hierarchy, so a message +logged to \samp{server.auth} is also seen by \samp{server} and +\samp{root}, but a \class{Logger} can prevent this by setting its +\member{propagate} attribute to \constant{False}. + +There are more classes provided by the \module{logging} package that +can be customized. When a \class{Logger} instance is told to log a +message, it creates a \class{LogRecord} instance that is sent to any +number of different \class{Handler} instances. Loggers and handlers +can also have an attached list of filters, and each filter can cause +the \class{LogRecord} to be ignored or can modify the record before +passing it along. When they're finally output, \class{LogRecord} +instances are converted to text by a \class{Formatter} class. All of +these classes can be replaced by your own specially-written classes. + +With all of these features the \module{logging} package should provide +enough flexibility for even the most complicated applications. This +is only an incomplete overview of its features, so please see the +\ulink{package's reference documentation}{../lib/module-logging.html} +for all of the details. Reading \pep{282} will also be helpful. + + +\begin{seealso} + +\seepep{282}{A Logging System}{Written by Vinay Sajip and Trent Mick; +implemented by Vinay Sajip.} + +\end{seealso} + + +%====================================================================== +\section{PEP 285: A Boolean Type\label{section-bool}} + +A Boolean type was added to Python 2.3. Two new constants were added +to the \module{__builtin__} module, \constant{True} and +\constant{False}. (\constant{True} and +\constant{False} constants were added to the built-ins +in Python 2.2.1, but the 2.2.1 versions are simply set to integer values of +1 and 0 and aren't a different type.) + +The type object for this new type is named +\class{bool}; the constructor for it takes any Python value and +converts it to \constant{True} or \constant{False}. + +\begin{verbatim} +>>> bool(1) +True +>>> bool(0) +False +>>> bool([]) +False +>>> bool( (1,) ) +True +\end{verbatim} + +Most of the standard library modules and built-in functions have been +changed to return Booleans. + +\begin{verbatim} +>>> obj = [] +>>> hasattr(obj, 'append') +True +>>> isinstance(obj, list) +True +>>> isinstance(obj, tuple) +False +\end{verbatim} + +Python's Booleans were added with the primary goal of making code +clearer. For example, if you're reading a function and encounter the +statement \code{return 1}, you might wonder whether the \code{1} +represents a Boolean truth value, an index, or a +coefficient that multiplies some other quantity. If the statement is +\code{return True}, however, the meaning of the return value is quite +clear. + +Python's Booleans were \emph{not} added for the sake of strict +type-checking. A very strict language such as Pascal would also +prevent you performing arithmetic with Booleans, and would require +that the expression in an \keyword{if} statement always evaluate to a +Boolean result. Python is not this strict and never will be, as +\pep{285} explicitly says. This means you can still use any +expression in an \keyword{if} statement, even ones that evaluate to a +list or tuple or some random object. The Boolean type is a +subclass of the \class{int} class so that arithmetic using a Boolean +still works. + +\begin{verbatim} +>>> True + 1 +2 +>>> False + 1 +1 +>>> False * 75 +0 +>>> True * 75 +75 +\end{verbatim} + +To sum up \constant{True} and \constant{False} in a sentence: they're +alternative ways to spell the integer values 1 and 0, with the single +difference that \function{str()} and \function{repr()} return the +strings \code{'True'} and \code{'False'} instead of \code{'1'} and +\code{'0'}. + +\begin{seealso} + +\seepep{285}{Adding a bool type}{Written and implemented by GvR.} + +\end{seealso} + + +%====================================================================== +\section{PEP 293: Codec Error Handling Callbacks} + +When encoding a Unicode string into a byte string, unencodable +characters may be encountered. So far, Python has allowed specifying +the error processing as either ``strict'' (raising +\exception{UnicodeError}), ``ignore'' (skipping the character), or +``replace'' (using a question mark in the output string), with +``strict'' being the default behavior. It may be desirable to specify +alternative processing of such errors, such as inserting an XML +character reference or HTML entity reference into the converted +string. + +Python now has a flexible framework to add different processing +strategies. New error handlers can be added with +\function{codecs.register_error}, and codecs then can access the error +handler with \function{codecs.lookup_error}. An equivalent C API has +been added for codecs written in C. The error handler gets the +necessary state information such as the string being converted, the +position in the string where the error was detected, and the target +encoding. The handler can then either raise an exception or return a +replacement string. + +Two additional error handlers have been implemented using this +framework: ``backslashreplace'' uses Python backslash quoting to +represent unencodable characters and ``xmlcharrefreplace'' emits +XML character references. + +\begin{seealso} + +\seepep{293}{Codec Error Handling Callbacks}{Written and implemented by +Walter D\"orwald.} + +\end{seealso} + + +%====================================================================== +\section{PEP 301: Package Index and Metadata for +Distutils\label{section-pep301}} + +Support for the long-requested Python catalog makes its first +appearance in 2.3. + +The heart of the catalog is the new Distutils \command{register} command. +Running \code{python setup.py register} will collect the metadata +describing a package, such as its name, version, maintainer, +description, \&c., and send it to a central catalog server. The +resulting catalog is available from \url{http://www.python.org/pypi}. + +To make the catalog a bit more useful, a new optional +\var{classifiers} keyword argument has been added to the Distutils +\function{setup()} function. A list of +\ulink{Trove}{http://catb.org/\textasciitilde esr/trove/}-style +strings can be supplied to help classify the software. + +Here's an example \file{setup.py} with classifiers, written to be compatible +with older versions of the Distutils: + +\begin{verbatim} +from distutils import core +kw = {'name': "Quixote", + 'version': "0.5.1", + 'description': "A highly Pythonic Web application framework", + # ... + } + +if (hasattr(core, 'setup_keywords') and + 'classifiers' in core.setup_keywords): + kw['classifiers'] = \ + ['Topic :: Internet :: WWW/HTTP :: Dynamic Content', + 'Environment :: No Input/Output (Daemon)', + 'Intended Audience :: Developers'], + +core.setup(**kw) +\end{verbatim} + +The full list of classifiers can be obtained by running +\verb|python setup.py register --list-classifiers|. + +\begin{seealso} + +\seepep{301}{Package Index and Metadata for Distutils}{Written and +implemented by Richard Jones.} + +\end{seealso} + + +%====================================================================== +\section{PEP 302: New Import Hooks \label{section-pep302}} + +While it's been possible to write custom import hooks ever since the +\module{ihooks} module was introduced in Python 1.3, no one has ever +been really happy with it because writing new import hooks is +difficult and messy. There have been various proposed alternatives +such as the \module{imputil} and \module{iu} modules, but none of them +has ever gained much acceptance, and none of them were easily usable +from \C{} code. + +\pep{302} borrows ideas from its predecessors, especially from +Gordon McMillan's \module{iu} module. Three new items +are added to the \module{sys} module: + +\begin{itemize} + \item \code{sys.path_hooks} is a list of callable objects; most + often they'll be classes. Each callable takes a string containing a + path and either returns an importer object that will handle imports + from this path or raises an \exception{ImportError} exception if it + can't handle this path. + + \item \code{sys.path_importer_cache} caches importer objects for + each path, so \code{sys.path_hooks} will only need to be traversed + once for each path. + + \item \code{sys.meta_path} is a list of importer objects that will + be traversed before \code{sys.path} is checked. This list is + initially empty, but user code can add objects to it. Additional + built-in and frozen modules can be imported by an object added to + this list. + +\end{itemize} + +Importer objects must have a single method, +\method{find_module(\var{fullname}, \var{path}=None)}. \var{fullname} +will be a module or package name, e.g. \samp{string} or +\samp{distutils.core}. \method{find_module()} must return a loader object +that has a single method, \method{load_module(\var{fullname})}, that +creates and returns the corresponding module object. + +Pseudo-code for Python's new import logic, therefore, looks something +like this (simplified a bit; see \pep{302} for the full details): + +\begin{verbatim} +for mp in sys.meta_path: + loader = mp(fullname) + if loader is not None: + <module> = loader.load_module(fullname) + +for path in sys.path: + for hook in sys.path_hooks: + try: + importer = hook(path) + except ImportError: + # ImportError, so try the other path hooks + pass + else: + loader = importer.find_module(fullname) + <module> = loader.load_module(fullname) + +# Not found! +raise ImportError +\end{verbatim} + +\begin{seealso} + +\seepep{302}{New Import Hooks}{Written by Just van~Rossum and Paul Moore. +Implemented by Just van~Rossum. +} + +\end{seealso} + + +%====================================================================== +\section{PEP 305: Comma-separated Files \label{section-pep305}} + +Comma-separated files are a format frequently used for exporting data +from databases and spreadsheets. Python 2.3 adds a parser for +comma-separated files. + +Comma-separated format is deceptively simple at first glance: + +\begin{verbatim} +Costs,150,200,3.95 +\end{verbatim} + +Read a line and call \code{line.split(',')}: what could be simpler? +But toss in string data that can contain commas, and things get more +complicated: + +\begin{verbatim} +"Costs",150,200,3.95,"Includes taxes, shipping, and sundry items" +\end{verbatim} + +A big ugly regular expression can parse this, but using the new +\module{csv} package is much simpler: + +\begin{verbatim} +import csv + +input = open('datafile', 'rb') +reader = csv.reader(input) +for line in reader: + print line +\end{verbatim} + +The \function{reader} function takes a number of different options. +The field separator isn't limited to the comma and can be changed to +any character, and so can the quoting and line-ending characters. + +Different dialects of comma-separated files can be defined and +registered; currently there are two dialects, both used by Microsoft Excel. +A separate \class{csv.writer} class will generate comma-separated files +from a succession of tuples or lists, quoting strings that contain the +delimiter. + +\begin{seealso} + +\seepep{305}{CSV File API}{Written and implemented +by Kevin Altis, Dave Cole, Andrew McNamara, Skip Montanaro, Cliff Wells. +} + +\end{seealso} + +%====================================================================== +\section{PEP 307: Pickle Enhancements \label{section-pep305}} + +The \module{pickle} and \module{cPickle} modules received some +attention during the 2.3 development cycle. In 2.2, new-style classes +could be pickled without difficulty, but they weren't pickled very +compactly; \pep{307} quotes a trivial example where a new-style class +results in a pickled string three times longer than that for a classic +class. + +The solution was to invent a new pickle protocol. The +\function{pickle.dumps()} function has supported a text-or-binary flag +for a long time. In 2.3, this flag is redefined from a Boolean to an +integer: 0 is the old text-mode pickle format, 1 is the old binary +format, and now 2 is a new 2.3-specific format. A new constant, +\constant{pickle.HIGHEST_PROTOCOL}, can be used to select the fanciest +protocol available. + +Unpickling is no longer considered a safe operation. 2.2's +\module{pickle} provided hooks for trying to prevent unsafe classes +from being unpickled (specifically, a +\member{__safe_for_unpickling__} attribute), but none of this code +was ever audited and therefore it's all been ripped out in 2.3. You +should not unpickle untrusted data in any version of Python. + +To reduce the pickling overhead for new-style classes, a new interface +for customizing pickling was added using three special methods: +\method{__getstate__}, \method{__setstate__}, and +\method{__getnewargs__}. Consult \pep{307} for the full semantics +of these methods. + +As a way to compress pickles yet further, it's now possible to use +integer codes instead of long strings to identify pickled classes. +The Python Software Foundation will maintain a list of standardized +codes; there's also a range of codes for private use. Currently no +codes have been specified. + +\begin{seealso} + +\seepep{307}{Extensions to the pickle protocol}{Written and implemented +by Guido van Rossum and Tim Peters.} + +\end{seealso} + +%====================================================================== +\section{Extended Slices\label{section-slices}} + +Ever since Python 1.4, the slicing syntax has supported an optional +third ``step'' or ``stride'' argument. For example, these are all +legal Python syntax: \code{L[1:10:2]}, \code{L[:-1:1]}, +\code{L[::-1]}. This was added to Python at the request of +the developers of Numerical Python, which uses the third argument +extensively. However, Python's built-in list, tuple, and string +sequence types have never supported this feature, raising a +\exception{TypeError} if you tried it. Michael Hudson contributed a +patch to fix this shortcoming. + +For example, you can now easily extract the elements of a list that +have even indexes: + +\begin{verbatim} +>>> L = range(10) +>>> L[::2] +[0, 2, 4, 6, 8] +\end{verbatim} + +Negative values also work to make a copy of the same list in reverse +order: + +\begin{verbatim} +>>> L[::-1] +[9, 8, 7, 6, 5, 4, 3, 2, 1, 0] +\end{verbatim} + +This also works for tuples, arrays, and strings: + +\begin{verbatim} +>>> s='abcd' +>>> s[::2] +'ac' +>>> s[::-1] +'dcba' +\end{verbatim} + +If you have a mutable sequence such as a list or an array you can +assign to or delete an extended slice, but there are some differences +between assignment to extended and regular slices. Assignment to a +regular slice can be used to change the length of the sequence: + +\begin{verbatim} +>>> a = range(3) +>>> a +[0, 1, 2] +>>> a[1:3] = [4, 5, 6] +>>> a +[0, 4, 5, 6] +\end{verbatim} + +Extended slices aren't this flexible. When assigning to an extended +slice, the list on the right hand side of the statement must contain +the same number of items as the slice it is replacing: + +\begin{verbatim} +>>> a = range(4) +>>> a +[0, 1, 2, 3] +>>> a[::2] +[0, 2] +>>> a[::2] = [0, -1] +>>> a +[0, 1, -1, 3] +>>> a[::2] = [0,1,2] +Traceback (most recent call last): + File "<stdin>", line 1, in ? +ValueError: attempt to assign sequence of size 3 to extended slice of size 2 +\end{verbatim} + +Deletion is more straightforward: + +\begin{verbatim} +>>> a = range(4) +>>> a +[0, 1, 2, 3] +>>> a[::2] +[0, 2] +>>> del a[::2] +>>> a +[1, 3] +\end{verbatim} + +One can also now pass slice objects to the +\method{__getitem__} methods of the built-in sequences: + +\begin{verbatim} +>>> range(10).__getitem__(slice(0, 5, 2)) +[0, 2, 4] +\end{verbatim} + +Or use slice objects directly in subscripts: + +\begin{verbatim} +>>> range(10)[slice(0, 5, 2)] +[0, 2, 4] +\end{verbatim} + +To simplify implementing sequences that support extended slicing, +slice objects now have a method \method{indices(\var{length})} which, +given the length of a sequence, returns a \code{(\var{start}, +\var{stop}, \var{step})} tuple that can be passed directly to +\function{range()}. +\method{indices()} handles omitted and out-of-bounds indices in a +manner consistent with regular slices (and this innocuous phrase hides +a welter of confusing details!). The method is intended to be used +like this: + +\begin{verbatim} +class FakeSeq: + ... + def calc_item(self, i): + ... + def __getitem__(self, item): + if isinstance(item, slice): + indices = item.indices(len(self)) + return FakeSeq([self.calc_item(i) for i in range(*indices)]) + else: + return self.calc_item(i) +\end{verbatim} + +From this example you can also see that the built-in \class{slice} +object is now the type object for the slice type, and is no longer a +function. This is consistent with Python 2.2, where \class{int}, +\class{str}, etc., underwent the same change. + + +%====================================================================== +\section{Other Language Changes} + +Here are all of the changes that Python 2.3 makes to the core Python +language. + +\begin{itemize} +\item The \keyword{yield} statement is now always a keyword, as +described in section~\ref{section-generators} of this document. + +\item A new built-in function \function{enumerate()} +was added, as described in section~\ref{section-enumerate} of this +document. + +\item Two new constants, \constant{True} and \constant{False} were +added along with the built-in \class{bool} type, as described in +section~\ref{section-bool} of this document. + +\item The \function{int()} type constructor will now return a long +integer instead of raising an \exception{OverflowError} when a string +or floating-point number is too large to fit into an integer. This +can lead to the paradoxical result that +\code{isinstance(int(\var{expression}), int)} is false, but that seems +unlikely to cause problems in practice. + +\item Built-in types now support the extended slicing syntax, +as described in section~\ref{section-slices} of this document. + +\item A new built-in function, \function{sum(\var{iterable}, \var{start}=0)}, +adds up the numeric items in the iterable object and returns their sum. +\function{sum()} only accepts numbers, meaning that you can't use it +to concatenate a bunch of strings. (Contributed by Alex +Martelli.) + +\item \code{list.insert(\var{pos}, \var{value})} used to +insert \var{value} at the front of the list when \var{pos} was +negative. The behaviour has now been changed to be consistent with +slice indexing, so when \var{pos} is -1 the value will be inserted +before the last element, and so forth. + +\item \code{list.index(\var{value})}, which searches for \var{value} +within the list and returns its index, now takes optional +\var{start} and \var{stop} arguments to limit the search to +only part of the list. + +\item Dictionaries have a new method, \method{pop(\var{key}\optional{, +\var{default}})}, that returns the value corresponding to \var{key} +and removes that key/value pair from the dictionary. If the requested +key isn't present in the dictionary, \var{default} is returned if it's +specified and \exception{KeyError} raised if it isn't. + +\begin{verbatim} +>>> d = {1:2} +>>> d +{1: 2} +>>> d.pop(4) +Traceback (most recent call last): + File "stdin", line 1, in ? +KeyError: 4 +>>> d.pop(1) +2 +>>> d.pop(1) +Traceback (most recent call last): + File "stdin", line 1, in ? +KeyError: 'pop(): dictionary is empty' +>>> d +{} +>>> +\end{verbatim} + +There's also a new class method, +\method{dict.fromkeys(\var{iterable}, \var{value})}, that +creates a dictionary with keys taken from the supplied iterator +\var{iterable} and all values set to \var{value}, defaulting to +\code{None}. + +(Patches contributed by Raymond Hettinger.) + +Also, the \function{dict()} constructor now accepts keyword arguments to +simplify creating small dictionaries: + +\begin{verbatim} +>>> dict(red=1, blue=2, green=3, black=4) +{'blue': 2, 'black': 4, 'green': 3, 'red': 1} +\end{verbatim} + +(Contributed by Just van~Rossum.) + +\item The \keyword{assert} statement no longer checks the \code{__debug__} +flag, so you can no longer disable assertions by assigning to \code{__debug__}. +Running Python with the \programopt{-O} switch will still generate +code that doesn't execute any assertions. + +\item Most type objects are now callable, so you can use them +to create new objects such as functions, classes, and modules. (This +means that the \module{new} module can be deprecated in a future +Python version, because you can now use the type objects available in +the \module{types} module.) +% XXX should new.py use PendingDeprecationWarning? +For example, you can create a new module object with the following code: + +\begin{verbatim} +>>> import types +>>> m = types.ModuleType('abc','docstring') +>>> m +<module 'abc' (built-in)> +>>> m.__doc__ +'docstring' +\end{verbatim} + +\item +A new warning, \exception{PendingDeprecationWarning} was added to +indicate features which are in the process of being +deprecated. The warning will \emph{not} be printed by default. To +check for use of features that will be deprecated in the future, +supply \programopt{-Walways::PendingDeprecationWarning::} on the +command line or use \function{warnings.filterwarnings()}. + +\item The process of deprecating string-based exceptions, as +in \code{raise "Error occurred"}, has begun. Raising a string will +now trigger \exception{PendingDeprecationWarning}. + +\item Using \code{None} as a variable name will now result in a +\exception{SyntaxWarning} warning. In a future version of Python, +\code{None} may finally become a keyword. + +\item The \method{xreadlines()} method of file objects, introduced in +Python 2.1, is no longer necessary because files now behave as their +own iterator. \method{xreadlines()} was originally introduced as a +faster way to loop over all the lines in a file, but now you can +simply write \code{for line in file_obj}. File objects also have a +new read-only \member{encoding} attribute that gives the encoding used +by the file; Unicode strings written to the file will be automatically +converted to bytes using the given encoding. + +\item The method resolution order used by new-style classes has +changed, though you'll only notice the difference if you have a really +complicated inheritance hierarchy. Classic classes are unaffected by +this change. Python 2.2 originally used a topological sort of a +class's ancestors, but 2.3 now uses the C3 algorithm as described in +the paper \ulink{``A Monotonic Superclass Linearization for +Dylan''}{http://www.webcom.com/haahr/dylan/linearization-oopsla96.html}. +To understand the motivation for this change, +read Michele Simionato's article +\ulink{``Python 2.3 Method Resolution Order''} + {http://www.python.org/2.3/mro.html}, or +read the thread on python-dev starting with the message at +\url{http://mail.python.org/pipermail/python-dev/2002-October/029035.html}. +Samuele Pedroni first pointed out the problem and also implemented the +fix by coding the C3 algorithm. + +\item Python runs multithreaded programs by switching between threads +after executing N bytecodes. The default value for N has been +increased from 10 to 100 bytecodes, speeding up single-threaded +applications by reducing the switching overhead. Some multithreaded +applications may suffer slower response time, but that's easily fixed +by setting the limit back to a lower number using +\function{sys.setcheckinterval(\var{N})}. +The limit can be retrieved with the new +\function{sys.getcheckinterval()} function. + +\item One minor but far-reaching change is that the names of extension +types defined by the modules included with Python now contain the +module and a \character{.} in front of the type name. For example, in +Python 2.2, if you created a socket and printed its +\member{__class__}, you'd get this output: + +\begin{verbatim} +>>> s = socket.socket() +>>> s.__class__ +<type 'socket'> +\end{verbatim} + +In 2.3, you get this: +\begin{verbatim} +>>> s.__class__ +<type '_socket.socket'> +\end{verbatim} + +\item One of the noted incompatibilities between old- and new-style + classes has been removed: you can now assign to the + \member{__name__} and \member{__bases__} attributes of new-style + classes. There are some restrictions on what can be assigned to + \member{__bases__} along the lines of those relating to assigning to + an instance's \member{__class__} attribute. + +\end{itemize} + + +%====================================================================== +\subsection{String Changes} + +\begin{itemize} + +\item The \keyword{in} operator now works differently for strings. +Previously, when evaluating \code{\var{X} in \var{Y}} where \var{X} +and \var{Y} are strings, \var{X} could only be a single character. +That's now changed; \var{X} can be a string of any length, and +\code{\var{X} in \var{Y}} will return \constant{True} if \var{X} is a +substring of \var{Y}. If \var{X} is the empty string, the result is +always \constant{True}. + +\begin{verbatim} +>>> 'ab' in 'abcd' +True +>>> 'ad' in 'abcd' +False +>>> '' in 'abcd' +True +\end{verbatim} + +Note that this doesn't tell you where the substring starts; if you +need that information, use the \method{find()} string method. + +\item The \method{strip()}, \method{lstrip()}, and \method{rstrip()} +string methods now have an optional argument for specifying the +characters to strip. The default is still to remove all whitespace +characters: + +\begin{verbatim} +>>> ' abc '.strip() +'abc' +>>> '><><abc<><><>'.strip('<>') +'abc' +>>> '><><abc<><><>\n'.strip('<>') +'abc<><><>\n' +>>> u'\u4000\u4001abc\u4000'.strip(u'\u4000') +u'\u4001abc' +>>> +\end{verbatim} + +(Suggested by Simon Brunning and implemented by Walter D\"orwald.) + +\item The \method{startswith()} and \method{endswith()} +string methods now accept negative numbers for the \var{start} and \var{end} +parameters. + +\item Another new string method is \method{zfill()}, originally a +function in the \module{string} module. \method{zfill()} pads a +numeric string with zeros on the left until it's the specified width. +Note that the \code{\%} operator is still more flexible and powerful +than \method{zfill()}. + +\begin{verbatim} +>>> '45'.zfill(4) +'0045' +>>> '12345'.zfill(4) +'12345' +>>> 'goofy'.zfill(6) +'0goofy' +\end{verbatim} + +(Contributed by Walter D\"orwald.) + +\item A new type object, \class{basestring}, has been added. + Both 8-bit strings and Unicode strings inherit from this type, so + \code{isinstance(obj, basestring)} will return \constant{True} for + either kind of string. It's a completely abstract type, so you + can't create \class{basestring} instances. + +\item Interned strings are no longer immortal and will now be +garbage-collected in the usual way when the only reference to them is +from the internal dictionary of interned strings. (Implemented by +Oren Tirosh.) + +\end{itemize} + + +%====================================================================== +\subsection{Optimizations} + +\begin{itemize} + +\item The creation of new-style class instances has been made much +faster; they're now faster than classic classes! + +\item The \method{sort()} method of list objects has been extensively +rewritten by Tim Peters, and the implementation is significantly +faster. + +\item Multiplication of large long integers is now much faster thanks +to an implementation of Karatsuba multiplication, an algorithm that +scales better than the O(n*n) required for the grade-school +multiplication algorithm. (Original patch by Christopher A. Craig, +and significantly reworked by Tim Peters.) + +\item The \code{SET_LINENO} opcode is now gone. This may provide a +small speed increase, depending on your compiler's idiosyncrasies. +See section~\ref{section-other} for a longer explanation. +(Removed by Michael Hudson.) + +\item \function{xrange()} objects now have their own iterator, making +\code{for i in xrange(n)} slightly faster than +\code{for i in range(n)}. (Patch by Raymond Hettinger.) + +\item A number of small rearrangements have been made in various +hotspots to improve performance, such as inlining a function or removing +some code. (Implemented mostly by GvR, but lots of people have +contributed single changes.) + +\end{itemize} + +The net result of the 2.3 optimizations is that Python 2.3 runs the +pystone benchmark around 25\% faster than Python 2.2. + + +%====================================================================== +\section{New, Improved, and Deprecated Modules} + +As usual, Python's standard library received a number of enhancements and +bug fixes. Here's a partial list of the most notable changes, sorted +alphabetically by module name. Consult the +\file{Misc/NEWS} file in the source tree for a more +complete list of changes, or look through the CVS logs for all the +details. + +\begin{itemize} + +\item The \module{array} module now supports arrays of Unicode +characters using the \character{u} format character. Arrays also now +support using the \code{+=} assignment operator to add another array's +contents, and the \code{*=} assignment operator to repeat an array. +(Contributed by Jason Orendorff.) + +\item The \module{bsddb} module has been replaced by version 4.1.6 +of the \ulink{PyBSDDB}{http://pybsddb.sourceforge.net} package, +providing a more complete interface to the transactional features of +the BerkeleyDB library. + +The old version of the module has been renamed to +\module{bsddb185} and is no longer built automatically; you'll +have to edit \file{Modules/Setup} to enable it. Note that the new +\module{bsddb} package is intended to be compatible with the +old module, so be sure to file bugs if you discover any +incompatibilities. When upgrading to Python 2.3, if the new interpreter is compiled +with a new version of +the underlying BerkeleyDB library, you will almost certainly have to +convert your database files to the new version. You can do this +fairly easily with the new scripts \file{db2pickle.py} and +\file{pickle2db.py} which you will find in the distribution's +\file{Tools/scripts} directory. If you've already been using the PyBSDDB +package and importing it as \module{bsddb3}, you will have to change your +\code{import} statements to import it as \module{bsddb}. + +\item The new \module{bz2} module is an interface to the bz2 data +compression library. bz2-compressed data is usually smaller than +corresponding \module{zlib}-compressed data. (Contributed by Gustavo Niemeyer.) + +\item A set of standard date/time types has been added in the new \module{datetime} +module. See the following section for more details. + +\item The Distutils \class{Extension} class now supports +an extra constructor argument named \var{depends} for listing +additional source files that an extension depends on. This lets +Distutils recompile the module if any of the dependency files are +modified. For example, if \file{sampmodule.c} includes the header +file \file{sample.h}, you would create the \class{Extension} object like +this: + +\begin{verbatim} +ext = Extension("samp", + sources=["sampmodule.c"], + depends=["sample.h"]) +\end{verbatim} + +Modifying \file{sample.h} would then cause the module to be recompiled. +(Contributed by Jeremy Hylton.) + +\item Other minor changes to Distutils: +it now checks for the \envvar{CC}, \envvar{CFLAGS}, \envvar{CPP}, +\envvar{LDFLAGS}, and \envvar{CPPFLAGS} environment variables, using +them to override the settings in Python's configuration (contributed +by Robert Weber). + +\item Previously the \module{doctest} module would only search the +docstrings of public methods and functions for test cases, but it now +also examines private ones as well. The \function{DocTestSuite(} +function creates a \class{unittest.TestSuite} object from a set of +\module{doctest} tests. + +\item The new \function{gc.get_referents(\var{object})} function returns a +list of all the objects referenced by \var{object}. + +\item The \module{getopt} module gained a new function, +\function{gnu_getopt()}, that supports the same arguments as the existing +\function{getopt()} function but uses GNU-style scanning mode. +The existing \function{getopt()} stops processing options as soon as a +non-option argument is encountered, but in GNU-style mode processing +continues, meaning that options and arguments can be mixed. For +example: + +\begin{verbatim} +>>> getopt.getopt(['-f', 'filename', 'output', '-v'], 'f:v') +([('-f', 'filename')], ['output', '-v']) +>>> getopt.gnu_getopt(['-f', 'filename', 'output', '-v'], 'f:v') +([('-f', 'filename'), ('-v', '')], ['output']) +\end{verbatim} + +(Contributed by Peter \AA{strand}.) + +\item The \module{grp}, \module{pwd}, and \module{resource} modules +now return enhanced tuples: + +\begin{verbatim} +>>> import grp +>>> g = grp.getgrnam('amk') +>>> g.gr_name, g.gr_gid +('amk', 500) +\end{verbatim} + +\item The \module{gzip} module can now handle files exceeding 2~GiB. + +\item The new \module{heapq} module contains an implementation of a +heap queue algorithm. A heap is an array-like data structure that +keeps items in a partially sorted order such that, for every index +\var{k}, \code{heap[\var{k}] <= heap[2*\var{k}+1]} and +\code{heap[\var{k}] <= heap[2*\var{k}+2]}. This makes it quick to +remove the smallest item, and inserting a new item while maintaining +the heap property is O(lg~n). (See +\url{http://www.nist.gov/dads/HTML/priorityque.html} for more +information about the priority queue data structure.) + +The \module{heapq} module provides \function{heappush()} and +\function{heappop()} functions for adding and removing items while +maintaining the heap property on top of some other mutable Python +sequence type. Here's an example that uses a Python list: + +\begin{verbatim} +>>> import heapq +>>> heap = [] +>>> for item in [3, 7, 5, 11, 1]: +... heapq.heappush(heap, item) +... +>>> heap +[1, 3, 5, 11, 7] +>>> heapq.heappop(heap) +1 +>>> heapq.heappop(heap) +3 +>>> heap +[5, 7, 11] +\end{verbatim} + +(Contributed by Kevin O'Connor.) + +\item The IDLE integrated development environment has been updated +using the code from the IDLEfork project +(\url{http://idlefork.sf.net}). The most notable feature is that the +code being developed is now executed in a subprocess, meaning that +there's no longer any need for manual \code{reload()} operations. +IDLE's core code has been incorporated into the standard library as the +\module{idlelib} package. + +\item The \module{imaplib} module now supports IMAP over SSL. +(Contributed by Piers Lauder and Tino Lange.) + +\item The \module{itertools} contains a number of useful functions for +use with iterators, inspired by various functions provided by the ML +and Haskell languages. For example, +\code{itertools.ifilter(predicate, iterator)} returns all elements in +the iterator for which the function \function{predicate()} returns +\constant{True}, and \code{itertools.repeat(obj, \var{N})} returns +\code{obj} \var{N} times. There are a number of other functions in +the module; see the \ulink{package's reference +documentation}{../lib/module-itertools.html} for details. +(Contributed by Raymond Hettinger.) + +\item Two new functions in the \module{math} module, +\function{degrees(\var{rads})} and \function{radians(\var{degs})}, +convert between radians and degrees. Other functions in the +\module{math} module such as \function{math.sin()} and +\function{math.cos()} have always required input values measured in +radians. Also, an optional \var{base} argument was added to +\function{math.log()} to make it easier to compute logarithms for +bases other than \code{e} and \code{10}. (Contributed by Raymond +Hettinger.) + +\item Several new POSIX functions (\function{getpgid()}, \function{killpg()}, +\function{lchown()}, \function{loadavg()}, \function{major()}, \function{makedev()}, +\function{minor()}, and \function{mknod()}) were added to the +\module{posix} module that underlies the \module{os} module. +(Contributed by Gustavo Niemeyer, Geert Jansen, and Denis S. Otkidach.) + +\item In the \module{os} module, the \function{*stat()} family of +functions can now report fractions of a second in a timestamp. Such +time stamps are represented as floats, similar to +the value returned by \function{time.time()}. + +During testing, it was found that some applications will break if time +stamps are floats. For compatibility, when using the tuple interface +of the \class{stat_result} time stamps will be represented as integers. +When using named fields (a feature first introduced in Python 2.2), +time stamps are still represented as integers, unless +\function{os.stat_float_times()} is invoked to enable float return +values: + +\begin{verbatim} +>>> os.stat("/tmp").st_mtime +1034791200 +>>> os.stat_float_times(True) +>>> os.stat("/tmp").st_mtime +1034791200.6335014 +\end{verbatim} + +In Python 2.4, the default will change to always returning floats. + +Application developers should enable this feature only if all their +libraries work properly when confronted with floating point time +stamps, or if they use the tuple API. If used, the feature should be +activated on an application level instead of trying to enable it on a +per-use basis. + +\item The \module{optparse} module contains a new parser for command-line arguments +that can convert option values to a particular Python type +and will automatically generate a usage message. See the following section for +more details. + +\item The old and never-documented \module{linuxaudiodev} module has +been deprecated, and a new version named \module{ossaudiodev} has been +added. The module was renamed because the OSS sound drivers can be +used on platforms other than Linux, and the interface has also been +tidied and brought up to date in various ways. (Contributed by Greg +Ward and Nicholas FitzRoy-Dale.) + +\item The new \module{platform} module contains a number of functions +that try to determine various properties of the platform you're +running on. There are functions for getting the architecture, CPU +type, the Windows OS version, and even the Linux distribution version. +(Contributed by Marc-Andr\'e Lemburg.) + +\item The parser objects provided by the \module{pyexpat} module +can now optionally buffer character data, resulting in fewer calls to +your character data handler and therefore faster performance. Setting +the parser object's \member{buffer_text} attribute to \constant{True} +will enable buffering. + +\item The \function{sample(\var{population}, \var{k})} function was +added to the \module{random} module. \var{population} is a sequence or +\class{xrange} object containing the elements of a population, and +\function{sample()} chooses \var{k} elements from the population without +replacing chosen elements. \var{k} can be any value up to +\code{len(\var{population})}. For example: + +\begin{verbatim} +>>> days = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'St', 'Sn'] +>>> random.sample(days, 3) # Choose 3 elements +['St', 'Sn', 'Th'] +>>> random.sample(days, 7) # Choose 7 elements +['Tu', 'Th', 'Mo', 'We', 'St', 'Fr', 'Sn'] +>>> random.sample(days, 7) # Choose 7 again +['We', 'Mo', 'Sn', 'Fr', 'Tu', 'St', 'Th'] +>>> random.sample(days, 8) # Can't choose eight +Traceback (most recent call last): + File "<stdin>", line 1, in ? + File "random.py", line 414, in sample + raise ValueError, "sample larger than population" +ValueError: sample larger than population +>>> random.sample(xrange(1,10000,2), 10) # Choose ten odd nos. under 10000 +[3407, 3805, 1505, 7023, 2401, 2267, 9733, 3151, 8083, 9195] +\end{verbatim} + +The \module{random} module now uses a new algorithm, the Mersenne +Twister, implemented in C. It's faster and more extensively studied +than the previous algorithm. + +(All changes contributed by Raymond Hettinger.) + +\item The \module{readline} module also gained a number of new +functions: \function{get_history_item()}, +\function{get_current_history_length()}, and \function{redisplay()}. + +\item The \module{rexec} and \module{Bastion} modules have been +declared dead, and attempts to import them will fail with a +\exception{RuntimeError}. New-style classes provide new ways to break +out of the restricted execution environment provided by +\module{rexec}, and no one has interest in fixing them or time to do +so. If you have applications using \module{rexec}, rewrite them to +use something else. + +(Sticking with Python 2.2 or 2.1 will not make your applications any +safer because there are known bugs in the \module{rexec} module in +those versions. To repeat: if you're using \module{rexec}, stop using +it immediately.) + +\item The \module{rotor} module has been deprecated because the + algorithm it uses for encryption is not believed to be secure. If + you need encryption, use one of the several AES Python modules + that are available separately. + +\item The \module{shutil} module gained a \function{move(\var{src}, +\var{dest})} function that recursively moves a file or directory to a new +location. + +\item Support for more advanced POSIX signal handling was added +to the \module{signal} but then removed again as it proved impossible +to make it work reliably across platforms. + +\item The \module{socket} module now supports timeouts. You +can call the \method{settimeout(\var{t})} method on a socket object to +set a timeout of \var{t} seconds. Subsequent socket operations that +take longer than \var{t} seconds to complete will abort and raise a +\exception{socket.timeout} exception. + +The original timeout implementation was by Tim O'Malley. Michael +Gilfix integrated it into the Python \module{socket} module and +shepherded it through a lengthy review. After the code was checked +in, Guido van~Rossum rewrote parts of it. (This is a good example of +a collaborative development process in action.) + +\item On Windows, the \module{socket} module now ships with Secure +Sockets Layer (SSL) support. + +\item The value of the C \constant{PYTHON_API_VERSION} macro is now +exposed at the Python level as \code{sys.api_version}. The current +exception can be cleared by calling the new \function{sys.exc_clear()} +function. + +\item The new \module{tarfile} module +allows reading from and writing to \program{tar}-format archive files. +(Contributed by Lars Gust\"abel.) + +\item The new \module{textwrap} module contains functions for wrapping +strings containing paragraphs of text. The \function{wrap(\var{text}, +\var{width})} function takes a string and returns a list containing +the text split into lines of no more than the chosen width. The +\function{fill(\var{text}, \var{width})} function returns a single +string, reformatted to fit into lines no longer than the chosen width. +(As you can guess, \function{fill()} is built on top of +\function{wrap()}. For example: + +\begin{verbatim} +>>> import textwrap +>>> paragraph = "Not a whit, we defy augury: ... more text ..." +>>> textwrap.wrap(paragraph, 60) +["Not a whit, we defy augury: there's a special providence in", + "the fall of a sparrow. If it be now, 'tis not to come; if it", + ...] +>>> print textwrap.fill(paragraph, 35) +Not a whit, we defy augury: there's +a special providence in the fall of +a sparrow. If it be now, 'tis not +to come; if it be not to come, it +will be now; if it be not now, yet +it will come: the readiness is all. +>>> +\end{verbatim} + +The module also contains a \class{TextWrapper} class that actually +implements the text wrapping strategy. Both the +\class{TextWrapper} class and the \function{wrap()} and +\function{fill()} functions support a number of additional keyword +arguments for fine-tuning the formatting; consult the \ulink{module's +documentation}{../lib/module-textwrap.html} for details. +(Contributed by Greg Ward.) + +\item The \module{thread} and \module{threading} modules now have +companion modules, \module{dummy_thread} and \module{dummy_threading}, +that provide a do-nothing implementation of the \module{thread} +module's interface for platforms where threads are not supported. The +intention is to simplify thread-aware modules (ones that \emph{don't} +rely on threads to run) by putting the following code at the top: + +\begin{verbatim} +try: + import threading as _threading +except ImportError: + import dummy_threading as _threading +\end{verbatim} + +In this example, \module{_threading} is used as the module name to make +it clear that the module being used is not necessarily the actual +\module{threading} module. Code can call functions and use classes in +\module{_threading} whether or not threads are supported, avoiding an +\keyword{if} statement and making the code slightly clearer. This +module will not magically make multithreaded code run without threads; +code that waits for another thread to return or to do something will +simply hang forever. + +\item The \module{time} module's \function{strptime()} function has +long been an annoyance because it uses the platform C library's +\function{strptime()} implementation, and different platforms +sometimes have odd bugs. Brett Cannon contributed a portable +implementation that's written in pure Python and should behave +identically on all platforms. + +\item The new \module{timeit} module helps measure how long snippets +of Python code take to execute. The \file{timeit.py} file can be run +directly from the command line, or the module's \class{Timer} class +can be imported and used directly. Here's a short example that +figures out whether it's faster to convert an 8-bit string to Unicode +by appending an empty Unicode string to it or by using the +\function{unicode()} function: + +\begin{verbatim} +import timeit + +timer1 = timeit.Timer('unicode("abc")') +timer2 = timeit.Timer('"abc" + u""') + +# Run three trials +print timer1.repeat(repeat=3, number=100000) +print timer2.repeat(repeat=3, number=100000) + +# On my laptop this outputs: +# [0.36831796169281006, 0.37441694736480713, 0.35304892063140869] +# [0.17574405670166016, 0.18193507194519043, 0.17565798759460449] +\end{verbatim} + +\item The \module{Tix} module has received various bug fixes and +updates for the current version of the Tix package. + +\item The \module{Tkinter} module now works with a thread-enabled +version of Tcl. Tcl's threading model requires that widgets only be +accessed from the thread in which they're created; accesses from +another thread can cause Tcl to panic. For certain Tcl interfaces, +\module{Tkinter} will now automatically avoid this +when a widget is accessed from a different thread by marshalling a +command, passing it to the correct thread, and waiting for the +results. Other interfaces can't be handled automatically but +\module{Tkinter} will now raise an exception on such an access so that +you can at least find out about the problem. See +\url{http://mail.python.org/pipermail/python-dev/2002-December/031107.html} % +for a more detailed explanation of this change. (Implemented by +Martin von~L\"owis.) + +\item Calling Tcl methods through \module{_tkinter} no longer +returns only strings. Instead, if Tcl returns other objects those +objects are converted to their Python equivalent, if one exists, or +wrapped with a \class{_tkinter.Tcl_Obj} object if no Python equivalent +exists. This behavior can be controlled through the +\method{wantobjects()} method of \class{tkapp} objects. + +When using \module{_tkinter} through the \module{Tkinter} module (as +most Tkinter applications will), this feature is always activated. It +should not cause compatibility problems, since Tkinter would always +convert string results to Python types where possible. + +If any incompatibilities are found, the old behavior can be restored +by setting the \member{wantobjects} variable in the \module{Tkinter} +module to false before creating the first \class{tkapp} object. + +\begin{verbatim} +import Tkinter +Tkinter.wantobjects = 0 +\end{verbatim} + +Any breakage caused by this change should be reported as a bug. + +\item The \module{UserDict} module has a new \class{DictMixin} class which +defines all dictionary methods for classes that already have a minimum +mapping interface. This greatly simplifies writing classes that need +to be substitutable for dictionaries, such as the classes in +the \module{shelve} module. + +Adding the mix-in as a superclass provides the full dictionary +interface whenever the class defines \method{__getitem__}, +\method{__setitem__}, \method{__delitem__}, and \method{keys}. +For example: + +\begin{verbatim} +>>> import UserDict +>>> class SeqDict(UserDict.DictMixin): +... """Dictionary lookalike implemented with lists.""" +... def __init__(self): +... self.keylist = [] +... self.valuelist = [] +... def __getitem__(self, key): +... try: +... i = self.keylist.index(key) +... except ValueError: +... raise KeyError +... return self.valuelist[i] +... def __setitem__(self, key, value): +... try: +... i = self.keylist.index(key) +... self.valuelist[i] = value +... except ValueError: +... self.keylist.append(key) +... self.valuelist.append(value) +... def __delitem__(self, key): +... try: +... i = self.keylist.index(key) +... except ValueError: +... raise KeyError +... self.keylist.pop(i) +... self.valuelist.pop(i) +... def keys(self): +... return list(self.keylist) +... +>>> s = SeqDict() +>>> dir(s) # See that other dictionary methods are implemented +['__cmp__', '__contains__', '__delitem__', '__doc__', '__getitem__', + '__init__', '__iter__', '__len__', '__module__', '__repr__', + '__setitem__', 'clear', 'get', 'has_key', 'items', 'iteritems', + 'iterkeys', 'itervalues', 'keylist', 'keys', 'pop', 'popitem', + 'setdefault', 'update', 'valuelist', 'values'] +\end{verbatim} + +(Contributed by Raymond Hettinger.) + +\item The DOM implementation +in \module{xml.dom.minidom} can now generate XML output in a +particular encoding by providing an optional encoding argument to +the \method{toxml()} and \method{toprettyxml()} methods of DOM nodes. + +\item The \module{xmlrpclib} module now supports an XML-RPC extension +for handling nil data values such as Python's \code{None}. Nil values +are always supported on unmarshalling an XML-RPC response. To +generate requests containing \code{None}, you must supply a true value +for the \var{allow_none} parameter when creating a \class{Marshaller} +instance. + +\item The new \module{DocXMLRPCServer} module allows writing +self-documenting XML-RPC servers. Run it in demo mode (as a program) +to see it in action. Pointing the Web browser to the RPC server +produces pydoc-style documentation; pointing xmlrpclib to the +server allows invoking the actual methods. +(Contributed by Brian Quinlan.) + +\item Support for internationalized domain names (RFCs 3454, 3490, +3491, and 3492) has been added. The ``idna'' encoding can be used +to convert between a Unicode domain name and the ASCII-compatible +encoding (ACE) of that name. + +\begin{alltt} +>{}>{}> u"www.Alliancefran\c caise.nu".encode("idna") +'www.xn--alliancefranaise-npb.nu' +\end{alltt} + +The \module{socket} module has also been extended to transparently +convert Unicode hostnames to the ACE version before passing them to +the C library. Modules that deal with hostnames such as +\module{httplib} and \module{ftplib}) also support Unicode host names; +\module{httplib} also sends HTTP \samp{Host} headers using the ACE +version of the domain name. \module{urllib} supports Unicode URLs +with non-ASCII host names as long as the \code{path} part of the URL +is ASCII only. + +To implement this change, the \module{stringprep} module, the +\code{mkstringprep} tool and the \code{punycode} encoding have been added. + +\end{itemize} + + +%====================================================================== +\subsection{Date/Time Type} + +Date and time types suitable for expressing timestamps were added as +the \module{datetime} module. The types don't support different +calendars or many fancy features, and just stick to the basics of +representing time. + +The three primary types are: \class{date}, representing a day, month, +and year; \class{time}, consisting of hour, minute, and second; and +\class{datetime}, which contains all the attributes of both +\class{date} and \class{time}. There's also a +\class{timedelta} class representing differences between two points +in time, and time zone logic is implemented by classes inheriting from +the abstract \class{tzinfo} class. + +You can create instances of \class{date} and \class{time} by either +supplying keyword arguments to the appropriate constructor, +e.g. \code{datetime.date(year=1972, month=10, day=15)}, or by using +one of a number of class methods. For example, the \method{date.today()} +class method returns the current local date. + +Once created, instances of the date/time classes are all immutable. +There are a number of methods for producing formatted strings from +objects: + +\begin{verbatim} +>>> import datetime +>>> now = datetime.datetime.now() +>>> now.isoformat() +'2002-12-30T21:27:03.994956' +>>> now.ctime() # Only available on date, datetime +'Mon Dec 30 21:27:03 2002' +>>> now.strftime('%Y %d %b') +'2002 30 Dec' +\end{verbatim} + +The \method{replace()} method allows modifying one or more fields +of a \class{date} or \class{datetime} instance, returning a new instance: + +\begin{verbatim} +>>> d = datetime.datetime.now() +>>> d +datetime.datetime(2002, 12, 30, 22, 15, 38, 827738) +>>> d.replace(year=2001, hour = 12) +datetime.datetime(2001, 12, 30, 12, 15, 38, 827738) +>>> +\end{verbatim} + +Instances can be compared, hashed, and converted to strings (the +result is the same as that of \method{isoformat()}). \class{date} and +\class{datetime} instances can be subtracted from each other, and +added to \class{timedelta} instances. The largest missing feature is +that there's no standard library support for parsing strings and getting back a +\class{date} or \class{datetime}. + +For more information, refer to the \ulink{module's reference +documentation}{../lib/module-datetime.html}. +(Contributed by Tim Peters.) + + +%====================================================================== +\subsection{The optparse Module} + +The \module{getopt} module provides simple parsing of command-line +arguments. The new \module{optparse} module (originally named Optik) +provides more elaborate command-line parsing that follows the \UNIX{} +conventions, automatically creates the output for \longprogramopt{help}, +and can perform different actions for different options. + +You start by creating an instance of \class{OptionParser} and telling +it what your program's options are. + +\begin{verbatim} +import sys +from optparse import OptionParser + +op = OptionParser() +op.add_option('-i', '--input', + action='store', type='string', dest='input', + help='set input filename') +op.add_option('-l', '--length', + action='store', type='int', dest='length', + help='set maximum length of output') +\end{verbatim} + +Parsing a command line is then done by calling the \method{parse_args()} +method. + +\begin{verbatim} +options, args = op.parse_args(sys.argv[1:]) +print options +print args +\end{verbatim} + +This returns an object containing all of the option values, +and a list of strings containing the remaining arguments. + +Invoking the script with the various arguments now works as you'd +expect it to. Note that the length argument is automatically +converted to an integer. + +\begin{verbatim} +$ ./python opt.py -i data arg1 +<Values at 0x400cad4c: {'input': 'data', 'length': None}> +['arg1'] +$ ./python opt.py --input=data --length=4 +<Values at 0x400cad2c: {'input': 'data', 'length': 4}> +[] +$ +\end{verbatim} + +The help message is automatically generated for you: + +\begin{verbatim} +$ ./python opt.py --help +usage: opt.py [options] + +options: + -h, --help show this help message and exit + -iINPUT, --input=INPUT + set input filename + -lLENGTH, --length=LENGTH + set maximum length of output +$ +\end{verbatim} +% $ prevent Emacs tex-mode from getting confused + +See the \ulink{module's documentation}{../lib/module-optparse.html} +for more details. + +Optik was written by Greg Ward, with suggestions from the readers of +the Getopt SIG. + + +%====================================================================== +\section{Pymalloc: A Specialized Object Allocator\label{section-pymalloc}} + +Pymalloc, a specialized object allocator written by Vladimir +Marangozov, was a feature added to Python 2.1. Pymalloc is intended +to be faster than the system \cfunction{malloc()} and to have less +memory overhead for allocation patterns typical of Python programs. +The allocator uses C's \cfunction{malloc()} function to get large +pools of memory and then fulfills smaller memory requests from these +pools. + +In 2.1 and 2.2, pymalloc was an experimental feature and wasn't +enabled by default; you had to explicitly enable it when compiling +Python by providing the +\longprogramopt{with-pymalloc} option to the \program{configure} +script. In 2.3, pymalloc has had further enhancements and is now +enabled by default; you'll have to supply +\longprogramopt{without-pymalloc} to disable it. + +This change is transparent to code written in Python; however, +pymalloc may expose bugs in C extensions. Authors of C extension +modules should test their code with pymalloc enabled, +because some incorrect code may cause core dumps at runtime. + +There's one particularly common error that causes problems. There are +a number of memory allocation functions in Python's C API that have +previously just been aliases for the C library's \cfunction{malloc()} +and \cfunction{free()}, meaning that if you accidentally called +mismatched functions the error wouldn't be noticeable. When the +object allocator is enabled, these functions aren't aliases of +\cfunction{malloc()} and \cfunction{free()} any more, and calling the +wrong function to free memory may get you a core dump. For example, +if memory was allocated using \cfunction{PyObject_Malloc()}, it has to +be freed using \cfunction{PyObject_Free()}, not \cfunction{free()}. A +few modules included with Python fell afoul of this and had to be +fixed; doubtless there are more third-party modules that will have the +same problem. + +As part of this change, the confusing multiple interfaces for +allocating memory have been consolidated down into two API families. +Memory allocated with one family must not be manipulated with +functions from the other family. There is one family for allocating +chunks of memory and another family of functions specifically for +allocating Python objects. + +\begin{itemize} + \item To allocate and free an undistinguished chunk of memory use + the ``raw memory'' family: \cfunction{PyMem_Malloc()}, + \cfunction{PyMem_Realloc()}, and \cfunction{PyMem_Free()}. + + \item The ``object memory'' family is the interface to the pymalloc + facility described above and is biased towards a large number of + ``small'' allocations: \cfunction{PyObject_Malloc}, + \cfunction{PyObject_Realloc}, and \cfunction{PyObject_Free}. + + \item To allocate and free Python objects, use the ``object'' family + \cfunction{PyObject_New()}, \cfunction{PyObject_NewVar()}, and + \cfunction{PyObject_Del()}. +\end{itemize} + +Thanks to lots of work by Tim Peters, pymalloc in 2.3 also provides +debugging features to catch memory overwrites and doubled frees in +both extension modules and in the interpreter itself. To enable this +support, compile a debugging version of the Python interpreter by +running \program{configure} with \longprogramopt{with-pydebug}. + +To aid extension writers, a header file \file{Misc/pymemcompat.h} is +distributed with the source to Python 2.3 that allows Python +extensions to use the 2.3 interfaces to memory allocation while +compiling against any version of Python since 1.5.2. You would copy +the file from Python's source distribution and bundle it with the +source of your extension. + +\begin{seealso} + +\seeurl{http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Objects/obmalloc.c} +{For the full details of the pymalloc implementation, see +the comments at the top of the file \file{Objects/obmalloc.c} in the +Python source code. The above link points to the file within the +SourceForge CVS browser.} + +\end{seealso} + + +% ====================================================================== +\section{Build and C API Changes} + +Changes to Python's build process and to the C API include: + +\begin{itemize} + +\item The cycle detection implementation used by the garbage collection +has proven to be stable, so it's now been made mandatory. You can no +longer compile Python without it, and the +\longprogramopt{with-cycle-gc} switch to \program{configure} has been removed. + +\item Python can now optionally be built as a shared library +(\file{libpython2.3.so}) by supplying \longprogramopt{enable-shared} +when running Python's \program{configure} script. (Contributed by Ondrej +Palkovsky.) + +\item The \csimplemacro{DL_EXPORT} and \csimplemacro{DL_IMPORT} macros +are now deprecated. Initialization functions for Python extension +modules should now be declared using the new macro +\csimplemacro{PyMODINIT_FUNC}, while the Python core will generally +use the \csimplemacro{PyAPI_FUNC} and \csimplemacro{PyAPI_DATA} +macros. + +\item The interpreter can be compiled without any docstrings for +the built-in functions and modules by supplying +\longprogramopt{without-doc-strings} to the \program{configure} script. +This makes the Python executable about 10\% smaller, but will also +mean that you can't get help for Python's built-ins. (Contributed by +Gustavo Niemeyer.) + +\item The \cfunction{PyArg_NoArgs()} macro is now deprecated, and code +that uses it should be changed. For Python 2.2 and later, the method +definition table can specify the +\constant{METH_NOARGS} flag, signalling that there are no arguments, and +the argument checking can then be removed. If compatibility with +pre-2.2 versions of Python is important, the code could use +\code{PyArg_ParseTuple(\var{args}, "")} instead, but this will be slower +than using \constant{METH_NOARGS}. + +\item \cfunction{PyArg_ParseTuple()} accepts new format characters for various sizes of unsigned integers: \samp{B} for \ctype{unsigned char}, +\samp{H} for \ctype{unsigned short int}, +\samp{I} for \ctype{unsigned int}, +and \samp{K} for \ctype{unsigned long long}. + +\item A new function, \cfunction{PyObject_DelItemString(\var{mapping}, +char *\var{key})} was added as shorthand for +\code{PyObject_DelItem(\var{mapping}, PyString_New(\var{key}))}. + +\item File objects now manage their internal string buffer +differently, increasing it exponentially when needed. This results in +the benchmark tests in \file{Lib/test/test_bufio.py} speeding up +considerably (from 57 seconds to 1.7 seconds, according to one +measurement). + +\item It's now possible to define class and static methods for a C +extension type by setting either the \constant{METH_CLASS} or +\constant{METH_STATIC} flags in a method's \ctype{PyMethodDef} +structure. + +\item Python now includes a copy of the Expat XML parser's source code, +removing any dependence on a system version or local installation of +Expat. + +\item If you dynamically allocate type objects in your extension, you +should be aware of a change in the rules relating to the +\member{__module__} and \member{__name__} attributes. In summary, +you will want to ensure the type's dictionary contains a +\code{'__module__'} key; making the module name the part of the type +name leading up to the final period will no longer have the desired +effect. For more detail, read the API reference documentation or the +source. + +\end{itemize} + + +%====================================================================== +\subsection{Port-Specific Changes} + +Support for a port to IBM's OS/2 using the EMX runtime environment was +merged into the main Python source tree. EMX is a POSIX emulation +layer over the OS/2 system APIs. The Python port for EMX tries to +support all the POSIX-like capability exposed by the EMX runtime, and +mostly succeeds; \function{fork()} and \function{fcntl()} are +restricted by the limitations of the underlying emulation layer. The +standard OS/2 port, which uses IBM's Visual Age compiler, also gained +support for case-sensitive import semantics as part of the integration +of the EMX port into CVS. (Contributed by Andrew MacIntyre.) + +On MacOS, most toolbox modules have been weaklinked to improve +backward compatibility. This means that modules will no longer fail +to load if a single routine is missing on the current OS version. +Instead calling the missing routine will raise an exception. +(Contributed by Jack Jansen.) + +The RPM spec files, found in the \file{Misc/RPM/} directory in the +Python source distribution, were updated for 2.3. (Contributed by +Sean Reifschneider.) + +Other new platforms now supported by Python include AtheOS +(\url{http://www.atheos.cx/}), GNU/Hurd, and OpenVMS. + + +%====================================================================== +\section{Other Changes and Fixes \label{section-other}} + +As usual, there were a bunch of other improvements and bugfixes +scattered throughout the source tree. A search through the CVS change +logs finds there were 523 patches applied and 514 bugs fixed between +Python 2.2 and 2.3. Both figures are likely to be underestimates. + +Some of the more notable changes are: + +\begin{itemize} + +\item If the \envvar{PYTHONINSPECT} environment variable is set, the +Python interpreter will enter the interactive prompt after running a +Python program, as if Python had been invoked with the \programopt{-i} +option. The environment variable can be set before running the Python +interpreter, or it can be set by the Python program as part of its +execution. + +\item The \file{regrtest.py} script now provides a way to allow ``all +resources except \var{foo}.'' A resource name passed to the +\programopt{-u} option can now be prefixed with a hyphen +(\character{-}) to mean ``remove this resource.'' For example, the +option `\code{\programopt{-u}all,-bsddb}' could be used to enable the +use of all resources except \code{bsddb}. + +\item The tools used to build the documentation now work under Cygwin +as well as \UNIX. + +\item The \code{SET_LINENO} opcode has been removed. Back in the +mists of time, this opcode was needed to produce line numbers in +tracebacks and support trace functions (for, e.g., \module{pdb}). +Since Python 1.5, the line numbers in tracebacks have been computed +using a different mechanism that works with ``python -O''. For Python +2.3 Michael Hudson implemented a similar scheme to determine when to +call the trace function, removing the need for \code{SET_LINENO} +entirely. + +It would be difficult to detect any resulting difference from Python +code, apart from a slight speed up when Python is run without +\programopt{-O}. + +C extensions that access the \member{f_lineno} field of frame objects +should instead call \code{PyCode_Addr2Line(f->f_code, f->f_lasti)}. +This will have the added effect of making the code work as desired +under ``python -O'' in earlier versions of Python. + +A nifty new feature is that trace functions can now assign to the +\member{f_lineno} attribute of frame objects, changing the line that +will be executed next. A \samp{jump} command has been added to the +\module{pdb} debugger taking advantage of this new feature. +(Implemented by Richie Hindle.) + +\end{itemize} + + +%====================================================================== +\section{Porting to Python 2.3} + +This section lists previously described changes that may require +changes to your code: + +\begin{itemize} + +\item \keyword{yield} is now always a keyword; if it's used as a +variable name in your code, a different name must be chosen. + +\item For strings \var{X} and \var{Y}, \code{\var{X} in \var{Y}} now works +if \var{X} is more than one character long. + +\item The \function{int()} type constructor will now return a long +integer instead of raising an \exception{OverflowError} when a string +or floating-point number is too large to fit into an integer. + +\item If you have Unicode strings that contain 8-bit characters, you +must declare the file's encoding (UTF-8, Latin-1, or whatever) by +adding a comment to the top of the file. See +section~\ref{section-encodings} for more information. + +\item Calling Tcl methods through \module{_tkinter} no longer +returns only strings. Instead, if Tcl returns other objects those +objects are converted to their Python equivalent, if one exists, or +wrapped with a \class{_tkinter.Tcl_Obj} object if no Python equivalent +exists. + +\item Large octal and hex literals such as +\code{0xffffffff} now trigger a \exception{FutureWarning}. Currently +they're stored as 32-bit numbers and result in a negative value, but +in Python 2.4 they'll become positive long integers. + +% The empty groups below prevent conversion to guillemets. +There are a few ways to fix this warning. If you really need a +positive number, just add an \samp{L} to the end of the literal. If +you're trying to get a 32-bit integer with low bits set and have +previously used an expression such as \code{\textasciitilde(1 <{}< 31)}, +it's probably +clearest to start with all bits set and clear the desired upper bits. +For example, to clear just the top bit (bit 31), you could write +\code{0xffffffffL {\&}{\textasciitilde}(1L<{}<31)}. + +\item You can no longer disable assertions by assigning to \code{__debug__}. + +\item The Distutils \function{setup()} function has gained various new +keyword arguments such as \var{depends}. Old versions of the +Distutils will abort if passed unknown keywords. A solution is to check +for the presence of the new \function{get_distutil_options()} function +in your \file{setup.py} and only uses the new keywords +with a version of the Distutils that supports them: + +\begin{verbatim} +from distutils import core + +kw = {'sources': 'foo.c', ...} +if hasattr(core, 'get_distutil_options'): + kw['depends'] = ['foo.h'] +ext = Extension(**kw) +\end{verbatim} + +\item Using \code{None} as a variable name will now result in a +\exception{SyntaxWarning} warning. + +\item Names of extension types defined by the modules included with +Python now contain the module and a \character{.} in front of the type +name. + +\end{itemize} + + +%====================================================================== +\section{Acknowledgements \label{acks}} + +The author would like to thank the following people for offering +suggestions, corrections and assistance with various drafts of this +article: Jeff Bauer, Simon Brunning, Brett Cannon, Michael Chermside, +Andrew Dalke, Scott David Daniels, Fred~L. Drake, Jr., David Fraser, +Kelly Gerber, +Raymond Hettinger, Michael Hudson, Chris Lambert, Detlef Lannert, +Martin von~L\"owis, Andrew MacIntyre, Lalo Martins, Chad Netzer, +Gustavo Niemeyer, Neal Norwitz, Hans Nowak, Chris Reedy, Francesco +Ricciardi, Vinay Sajip, Neil Schemenauer, Roman Suzi, Jason Tishler, +Just van~Rossum. + +\end{document} diff --git a/sys/src/cmd/python/Doc/whatsnew/whatsnew24.tex b/sys/src/cmd/python/Doc/whatsnew/whatsnew24.tex new file mode 100644 index 000000000..6b146946a --- /dev/null +++ b/sys/src/cmd/python/Doc/whatsnew/whatsnew24.tex @@ -0,0 +1,1757 @@ +\documentclass{howto} +\usepackage{distutils} +% $Id: whatsnew24.tex 50936 2006-07-29 15:42:46Z andrew.kuchling $ + +% Don't write extensive text for new sections; I'll do that. +% Feel free to add commented-out reminders of things that need +% to be covered. --amk + +\title{What's New in Python 2.4} +\release{1.02} +\author{A.M.\ Kuchling} +\authoraddress{ + \strong{Python Software Foundation}\\ + Email: \email{amk@amk.ca} +} + +\begin{document} +\maketitle +\tableofcontents + +This article explains the new features in Python 2.4.1, released on +March~30, 2005. + +Python 2.4 is a medium-sized release. It doesn't introduce as many +changes as the radical Python 2.2, but introduces more features than +the conservative 2.3 release. The most significant new language +features are function decorators and generator expressions; most other +changes are to the standard library. + +According to the CVS change logs, there were 481 patches applied and +502 bugs fixed between Python 2.3 and 2.4. Both figures are likely to +be underestimates. + +This article doesn't attempt to provide a complete specification of +every single new feature, but instead provides a brief introduction to +each feature. For full details, you should refer to the documentation +for Python 2.4, such as the \citetitle[../lib/lib.html]{Python Library +Reference} and the \citetitle[../ref/ref.html]{Python Reference +Manual}. Often you will be referred to the PEP for a particular new +feature for explanations of the implementation and design rationale. + + +%====================================================================== +\section{PEP 218: Built-In Set Objects} + +Python 2.3 introduced the \module{sets} module. C implementations of +set data types have now been added to the Python core as two new +built-in types, \function{set(\var{iterable})} and +\function{frozenset(\var{iterable})}. They provide high speed +operations for membership testing, for eliminating duplicates from +sequences, and for mathematical operations like unions, intersections, +differences, and symmetric differences. + +\begin{verbatim} +>>> a = set('abracadabra') # form a set from a string +>>> 'z' in a # fast membership testing +False +>>> a # unique letters in a +set(['a', 'r', 'b', 'c', 'd']) +>>> ''.join(a) # convert back into a string +'arbcd' + +>>> b = set('alacazam') # form a second set +>>> a - b # letters in a but not in b +set(['r', 'd', 'b']) +>>> a | b # letters in either a or b +set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l']) +>>> a & b # letters in both a and b +set(['a', 'c']) +>>> a ^ b # letters in a or b but not both +set(['r', 'd', 'b', 'm', 'z', 'l']) + +>>> a.add('z') # add a new element +>>> a.update('wxy') # add multiple new elements +>>> a +set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'x', 'z']) +>>> a.remove('x') # take one element out +>>> a +set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'z']) +\end{verbatim} + +The \function{frozenset} type is an immutable version of \function{set}. +Since it is immutable and hashable, it may be used as a dictionary key or +as a member of another set. + +The \module{sets} module remains in the standard library, and may be +useful if you wish to subclass the \class{Set} or \class{ImmutableSet} +classes. There are currently no plans to deprecate the module. + +\begin{seealso} +\seepep{218}{Adding a Built-In Set Object Type}{Originally proposed by +Greg Wilson and ultimately implemented by Raymond Hettinger.} +\end{seealso} + + +%====================================================================== +\section{PEP 237: Unifying Long Integers and Integers} + +The lengthy transition process for this PEP, begun in Python 2.2, +takes another step forward in Python 2.4. In 2.3, certain integer +operations that would behave differently after int/long unification +triggered \exception{FutureWarning} warnings and returned values +limited to 32 or 64 bits (depending on your platform). In 2.4, these +expressions no longer produce a warning and instead produce a +different result that's usually a long integer. + +The problematic expressions are primarily left shifts and lengthy +hexadecimal and octal constants. For example, +\code{2 \textless{}\textless{} 32} results +in a warning in 2.3, evaluating to 0 on 32-bit platforms. In Python +2.4, this expression now returns the correct answer, 8589934592. + +\begin{seealso} +\seepep{237}{Unifying Long Integers and Integers}{Original PEP +written by Moshe Zadka and GvR. The changes for 2.4 were implemented by +Kalle Svensson.} +\end{seealso} + + +%====================================================================== +\section{PEP 289: Generator Expressions} + +The iterator feature introduced in Python 2.2 and the +\module{itertools} module make it easier to write programs that loop +through large data sets without having the entire data set in memory +at one time. List comprehensions don't fit into this picture very +well because they produce a Python list object containing all of the +items. This unavoidably pulls all of the objects into memory, which +can be a problem if your data set is very large. When trying to write +a functionally-styled program, it would be natural to write something +like: + +\begin{verbatim} +links = [link for link in get_all_links() if not link.followed] +for link in links: + ... +\end{verbatim} + +instead of + +\begin{verbatim} +for link in get_all_links(): + if link.followed: + continue + ... +\end{verbatim} + +The first form is more concise and perhaps more readable, but if +you're dealing with a large number of link objects you'd have to write +the second form to avoid having all link objects in memory at the same +time. + +Generator expressions work similarly to list comprehensions but don't +materialize the entire list; instead they create a generator that will +return elements one by one. The above example could be written as: + +\begin{verbatim} +links = (link for link in get_all_links() if not link.followed) +for link in links: + ... +\end{verbatim} + +Generator expressions always have to be written inside parentheses, as +in the above example. The parentheses signalling a function call also +count, so if you want to create an iterator that will be immediately +passed to a function you could write: + +\begin{verbatim} +print sum(obj.count for obj in list_all_objects()) +\end{verbatim} + +Generator expressions differ from list comprehensions in various small +ways. Most notably, the loop variable (\var{obj} in the above +example) is not accessible outside of the generator expression. List +comprehensions leave the variable assigned to its last value; future +versions of Python will change this, making list comprehensions match +generator expressions in this respect. + +\begin{seealso} +\seepep{289}{Generator Expressions}{Proposed by Raymond Hettinger and +implemented by Jiwon Seo with early efforts steered by Hye-Shik Chang.} +\end{seealso} + + +%====================================================================== +\section{PEP 292: Simpler String Substitutions} + +Some new classes in the standard library provide an alternative +mechanism for substituting variables into strings; this style of +substitution may be better for applications where untrained +users need to edit templates. + +The usual way of substituting variables by name is the \code{\%} +operator: + +\begin{verbatim} +>>> '%(page)i: %(title)s' % {'page':2, 'title': 'The Best of Times'} +'2: The Best of Times' +\end{verbatim} + +When writing the template string, it can be easy to forget the +\samp{i} or \samp{s} after the closing parenthesis. This isn't a big +problem if the template is in a Python module, because you run the +code, get an ``Unsupported format character'' \exception{ValueError}, +and fix the problem. However, consider an application such as Mailman +where template strings or translations are being edited by users who +aren't aware of the Python language. The format string's syntax is +complicated to explain to such users, and if they make a mistake, it's +difficult to provide helpful feedback to them. + +PEP 292 adds a \class{Template} class to the \module{string} module +that uses \samp{\$} to indicate a substitution: + +\begin{verbatim} +>>> import string +>>> t = string.Template('$page: $title') +>>> t.substitute({'page':2, 'title': 'The Best of Times'}) +'2: The Best of Times' +\end{verbatim} + +% $ Terminate $-mode for Emacs + +If a key is missing from the dictionary, the \method{substitute} method +will raise a \exception{KeyError}. There's also a \method{safe_substitute} +method that ignores missing keys: + +\begin{verbatim} +>>> t = string.Template('$page: $title') +>>> t.safe_substitute({'page':3}) +'3: $title' +\end{verbatim} + +% $ Terminate math-mode for Emacs + + +\begin{seealso} +\seepep{292}{Simpler String Substitutions}{Written and implemented +by Barry Warsaw.} +\end{seealso} + + +%====================================================================== +\section{PEP 318: Decorators for Functions and Methods} + +Python 2.2 extended Python's object model by adding static methods and +class methods, but it didn't extend Python's syntax to provide any new +way of defining static or class methods. Instead, you had to write a +\keyword{def} statement in the usual way, and pass the resulting +method to a \function{staticmethod()} or \function{classmethod()} +function that would wrap up the function as a method of the new type. +Your code would look like this: + +\begin{verbatim} +class C: + def meth (cls): + ... + + meth = classmethod(meth) # Rebind name to wrapped-up class method +\end{verbatim} + +If the method was very long, it would be easy to miss or forget the +\function{classmethod()} invocation after the function body. + +The intention was always to add some syntax to make such definitions +more readable, but at the time of 2.2's release a good syntax was not +obvious. Today a good syntax \emph{still} isn't obvious but users are +asking for easier access to the feature; a new syntactic feature has +been added to meet this need. + +The new feature is called ``function decorators''. The name comes +from the idea that \function{classmethod}, \function{staticmethod}, +and friends are storing additional information on a function object; +they're \emph{decorating} functions with more details. + +The notation borrows from Java and uses the \character{@} character as an +indicator. Using the new syntax, the example above would be written: + +\begin{verbatim} +class C: + + @classmethod + def meth (cls): + ... + +\end{verbatim} + +The \code{@classmethod} is shorthand for the +\code{meth=classmethod(meth)} assignment. More generally, if you have +the following: + +\begin{verbatim} +@A +@B +@C +def f (): + ... +\end{verbatim} + +It's equivalent to the following pre-decorator code: + +\begin{verbatim} +def f(): ... +f = A(B(C(f))) +\end{verbatim} + +Decorators must come on the line before a function definition, one decorator +per line, and can't be on the same line as the def statement, meaning that +\code{@A def f(): ...} is illegal. You can only decorate function +definitions, either at the module level or inside a class; you can't +decorate class definitions. + +A decorator is just a function that takes the function to be decorated as an +argument and returns either the same function or some new object. The +return value of the decorator need not be callable (though it typically is), +unless further decorators will be applied to the result. It's easy to write +your own decorators. The following simple example just sets an attribute on +the function object: + +\begin{verbatim} +>>> def deco(func): +... func.attr = 'decorated' +... return func +... +>>> @deco +... def f(): pass +... +>>> f +<function f at 0x402ef0d4> +>>> f.attr +'decorated' +>>> +\end{verbatim} + +As a slightly more realistic example, the following decorator checks +that the supplied argument is an integer: + +\begin{verbatim} +def require_int (func): + def wrapper (arg): + assert isinstance(arg, int) + return func(arg) + + return wrapper + +@require_int +def p1 (arg): + print arg + +@require_int +def p2(arg): + print arg*2 +\end{verbatim} + +An example in \pep{318} contains a fancier version of this idea that +lets you both specify the required type and check the returned type. + +Decorator functions can take arguments. If arguments are supplied, +your decorator function is called with only those arguments and must +return a new decorator function; this function must take a single +function and return a function, as previously described. In other +words, \code{@A @B @C(args)} becomes: + +\begin{verbatim} +def f(): ... +_deco = C(args) +f = A(B(_deco(f))) +\end{verbatim} + +Getting this right can be slightly brain-bending, but it's not too +difficult. + +A small related change makes the \member{func_name} attribute of +functions writable. This attribute is used to display function names +in tracebacks, so decorators should change the name of any new +function that's constructed and returned. + +\begin{seealso} +\seepep{318}{Decorators for Functions, Methods and Classes}{Written +by Kevin D. Smith, Jim Jewett, and Skip Montanaro. Several people +wrote patches implementing function decorators, but the one that was +actually checked in was patch \#979728, written by Mark Russell.} + +\seeurl{http://www.python.org/moin/PythonDecoratorLibrary} +{This Wiki page contains several examples of decorators.} + +\end{seealso} + + +%====================================================================== +\section{PEP 322: Reverse Iteration} + +A new built-in function, \function{reversed(\var{seq})}, takes a sequence +and returns an iterator that loops over the elements of the sequence +in reverse order. + +\begin{verbatim} +>>> for i in reversed(xrange(1,4)): +... print i +... +3 +2 +1 +\end{verbatim} + +Compared to extended slicing, such as \code{range(1,4)[::-1]}, +\function{reversed()} is easier to read, runs faster, and uses +substantially less memory. + +Note that \function{reversed()} only accepts sequences, not arbitrary +iterators. If you want to reverse an iterator, first convert it to +a list with \function{list()}. + +\begin{verbatim} +>>> input = open('/etc/passwd', 'r') +>>> for line in reversed(list(input)): +... print line +... +root:*:0:0:System Administrator:/var/root:/bin/tcsh + ... +\end{verbatim} + +\begin{seealso} +\seepep{322}{Reverse Iteration}{Written and implemented by Raymond Hettinger.} + +\end{seealso} + + +%====================================================================== +\section{PEP 324: New subprocess Module} + +The standard library provides a number of ways to execute a +subprocess, offering different features and different levels of +complexity. \function{os.system(\var{command})} is easy to use, but +slow (it runs a shell process which executes the command) and +dangerous (you have to be careful about escaping the shell's +metacharacters). The \module{popen2} module offers classes that can +capture standard output and standard error from the subprocess, but +the naming is confusing. The \module{subprocess} module cleans +this up, providing a unified interface that offers all the features +you might need. + +Instead of \module{popen2}'s collection of classes, +\module{subprocess} contains a single class called \class{Popen} +whose constructor supports a number of different keyword arguments. + +\begin{verbatim} +class Popen(args, bufsize=0, executable=None, + stdin=None, stdout=None, stderr=None, + preexec_fn=None, close_fds=False, shell=False, + cwd=None, env=None, universal_newlines=False, + startupinfo=None, creationflags=0): +\end{verbatim} + +\var{args} is commonly a sequence of strings that will be the +arguments to the program executed as the subprocess. (If the +\var{shell} argument is true, \var{args} can be a string which will +then be passed on to the shell for interpretation, just as +\function{os.system()} does.) + +\var{stdin}, \var{stdout}, and \var{stderr} specify what the +subprocess's input, output, and error streams will be. You can +provide a file object or a file descriptor, or you can use the +constant \code{subprocess.PIPE} to create a pipe between the +subprocess and the parent. + +The constructor has a number of handy options: + +\begin{itemize} + \item \var{close_fds} requests that all file descriptors be closed + before running the subprocess. + + \item \var{cwd} specifies the working directory in which the + subprocess will be executed (defaulting to whatever the parent's + working directory is). + + \item \var{env} is a dictionary specifying environment variables. + + \item \var{preexec_fn} is a function that gets called before the + child is started. + + \item \var{universal_newlines} opens the child's input and output + using Python's universal newline feature. + +\end{itemize} + +Once you've created the \class{Popen} instance, +you can call its \method{wait()} method to pause until the subprocess +has exited, \method{poll()} to check if it's exited without pausing, +or \method{communicate(\var{data})} to send the string \var{data} to +the subprocess's standard input. \method{communicate(\var{data})} +then reads any data that the subprocess has sent to its standard output +or standard error, returning a tuple \code{(\var{stdout_data}, +\var{stderr_data})}. + +\function{call()} is a shortcut that passes its arguments along to the +\class{Popen} constructor, waits for the command to complete, and +returns the status code of the subprocess. It can serve as a safer +analog to \function{os.system()}: + +\begin{verbatim} +sts = subprocess.call(['dpkg', '-i', '/tmp/new-package.deb']) +if sts == 0: + # Success + ... +else: + # dpkg returned an error + ... +\end{verbatim} + +The command is invoked without use of the shell. If you really do want to +use the shell, you can add \code{shell=True} as a keyword argument and provide +a string instead of a sequence: + +\begin{verbatim} +sts = subprocess.call('dpkg -i /tmp/new-package.deb', shell=True) +\end{verbatim} + +The PEP takes various examples of shell and Python code and shows how +they'd be translated into Python code that uses \module{subprocess}. +Reading this section of the PEP is highly recommended. + +\begin{seealso} +\seepep{324}{subprocess - New process module}{Written and implemented by Peter {\AA}strand, with assistance from Fredrik Lundh and others.} +\end{seealso} + + +%====================================================================== +\section{PEP 327: Decimal Data Type} + +Python has always supported floating-point (FP) numbers, based on the +underlying C \ctype{double} type, as a data type. However, while most +programming languages provide a floating-point type, many people (even +programmers) are unaware that floating-point numbers don't represent +certain decimal fractions accurately. The new \class{Decimal} type +can represent these fractions accurately, up to a user-specified +precision limit. + + +\subsection{Why is Decimal needed?} + +The limitations arise from the representation used for floating-point numbers. +FP numbers are made up of three components: + +\begin{itemize} +\item The sign, which is positive or negative. +\item The mantissa, which is a single-digit binary number +followed by a fractional part. For example, \code{1.01} in base-2 notation +is \code{1 + 0/2 + 1/4}, or 1.25 in decimal notation. +\item The exponent, which tells where the decimal point is located in the number represented. +\end{itemize} + +For example, the number 1.25 has positive sign, a mantissa value of +1.01 (in binary), and an exponent of 0 (the decimal point doesn't need +to be shifted). The number 5 has the same sign and mantissa, but the +exponent is 2 because the mantissa is multiplied by 4 (2 to the power +of the exponent 2); 1.25 * 4 equals 5. + +Modern systems usually provide floating-point support that conforms to +a standard called IEEE 754. C's \ctype{double} type is usually +implemented as a 64-bit IEEE 754 number, which uses 52 bits of space +for the mantissa. This means that numbers can only be specified to 52 +bits of precision. If you're trying to represent numbers whose +expansion repeats endlessly, the expansion is cut off after 52 bits. +Unfortunately, most software needs to produce output in base 10, and +common fractions in base 10 are often repeating decimals in binary. +For example, 1.1 decimal is binary \code{1.0001100110011 ...}; .1 = +1/16 + 1/32 + 1/256 plus an infinite number of additional terms. IEEE +754 has to chop off that infinitely repeated decimal after 52 digits, +so the representation is slightly inaccurate. + +Sometimes you can see this inaccuracy when the number is printed: +\begin{verbatim} +>>> 1.1 +1.1000000000000001 +\end{verbatim} + +The inaccuracy isn't always visible when you print the number because +the FP-to-decimal-string conversion is provided by the C library, and +most C libraries try to produce sensible output. Even if it's not +displayed, however, the inaccuracy is still there and subsequent +operations can magnify the error. + +For many applications this doesn't matter. If I'm plotting points and +displaying them on my monitor, the difference between 1.1 and +1.1000000000000001 is too small to be visible. Reports often limit +output to a certain number of decimal places, and if you round the +number to two or three or even eight decimal places, the error is +never apparent. However, for applications where it does matter, +it's a lot of work to implement your own custom arithmetic routines. + +Hence, the \class{Decimal} type was created. + +\subsection{The \class{Decimal} type} + +A new module, \module{decimal}, was added to Python's standard +library. It contains two classes, \class{Decimal} and +\class{Context}. \class{Decimal} instances represent numbers, and +\class{Context} instances are used to wrap up various settings such as +the precision and default rounding mode. + +\class{Decimal} instances are immutable, like regular Python integers +and FP numbers; once it's been created, you can't change the value an +instance represents. \class{Decimal} instances can be created from +integers or strings: + +\begin{verbatim} +>>> import decimal +>>> decimal.Decimal(1972) +Decimal("1972") +>>> decimal.Decimal("1.1") +Decimal("1.1") +\end{verbatim} + +You can also provide tuples containing the sign, the mantissa represented +as a tuple of decimal digits, and the exponent: + +\begin{verbatim} +>>> decimal.Decimal((1, (1, 4, 7, 5), -2)) +Decimal("-14.75") +\end{verbatim} + +Cautionary note: the sign bit is a Boolean value, so 0 is positive and +1 is negative. + +Converting from floating-point numbers poses a bit of a problem: +should the FP number representing 1.1 turn into the decimal number for +exactly 1.1, or for 1.1 plus whatever inaccuracies are introduced? +The decision was to dodge the issue and leave such a conversion out of +the API. Instead, you should convert the floating-point number into a +string using the desired precision and pass the string to the +\class{Decimal} constructor: + +\begin{verbatim} +>>> f = 1.1 +>>> decimal.Decimal(str(f)) +Decimal("1.1") +>>> decimal.Decimal('%.12f' % f) +Decimal("1.100000000000") +\end{verbatim} + +Once you have \class{Decimal} instances, you can perform the usual +mathematical operations on them. One limitation: exponentiation +requires an integer exponent: + +\begin{verbatim} +>>> a = decimal.Decimal('35.72') +>>> b = decimal.Decimal('1.73') +>>> a+b +Decimal("37.45") +>>> a-b +Decimal("33.99") +>>> a*b +Decimal("61.7956") +>>> a/b +Decimal("20.64739884393063583815028902") +>>> a ** 2 +Decimal("1275.9184") +>>> a**b +Traceback (most recent call last): + ... +decimal.InvalidOperation: x ** (non-integer) +\end{verbatim} + +You can combine \class{Decimal} instances with integers, but not with +floating-point numbers: + +\begin{verbatim} +>>> a + 4 +Decimal("39.72") +>>> a + 4.5 +Traceback (most recent call last): + ... +TypeError: You can interact Decimal only with int, long or Decimal data types. +>>> +\end{verbatim} + +\class{Decimal} numbers can be used with the \module{math} and +\module{cmath} modules, but note that they'll be immediately converted to +floating-point numbers before the operation is performed, resulting in +a possible loss of precision and accuracy. You'll also get back a +regular floating-point number and not a \class{Decimal}. + +\begin{verbatim} +>>> import math, cmath +>>> d = decimal.Decimal('123456789012.345') +>>> math.sqrt(d) +351364.18288201344 +>>> cmath.sqrt(-d) +351364.18288201344j +\end{verbatim} + +\class{Decimal} instances have a \method{sqrt()} method that +returns a \class{Decimal}, but if you need other things such as +trigonometric functions you'll have to implement them. + +\begin{verbatim} +>>> d.sqrt() +Decimal("351364.1828820134592177245001") +\end{verbatim} + + +\subsection{The \class{Context} type} + +Instances of the \class{Context} class encapsulate several settings for +decimal operations: + +\begin{itemize} + \item \member{prec} is the precision, the number of decimal places. + \item \member{rounding} specifies the rounding mode. The \module{decimal} + module has constants for the various possibilities: + \constant{ROUND_DOWN}, \constant{ROUND_CEILING}, + \constant{ROUND_HALF_EVEN}, and various others. + \item \member{traps} is a dictionary specifying what happens on +encountering certain error conditions: either an exception is raised or +a value is returned. Some examples of error conditions are +division by zero, loss of precision, and overflow. +\end{itemize} + +There's a thread-local default context available by calling +\function{getcontext()}; you can change the properties of this context +to alter the default precision, rounding, or trap handling. The +following example shows the effect of changing the precision of the default +context: + +\begin{verbatim} +>>> decimal.getcontext().prec +28 +>>> decimal.Decimal(1) / decimal.Decimal(7) +Decimal("0.1428571428571428571428571429") +>>> decimal.getcontext().prec = 9 +>>> decimal.Decimal(1) / decimal.Decimal(7) +Decimal("0.142857143") +\end{verbatim} + +The default action for error conditions is selectable; the module can +either return a special value such as infinity or not-a-number, or +exceptions can be raised: + +\begin{verbatim} +>>> decimal.Decimal(1) / decimal.Decimal(0) +Traceback (most recent call last): + ... +decimal.DivisionByZero: x / 0 +>>> decimal.getcontext().traps[decimal.DivisionByZero] = False +>>> decimal.Decimal(1) / decimal.Decimal(0) +Decimal("Infinity") +>>> +\end{verbatim} + +The \class{Context} instance also has various methods for formatting +numbers such as \method{to_eng_string()} and \method{to_sci_string()}. + +For more information, see the documentation for the \module{decimal} +module, which includes a quick-start tutorial and a reference. + +\begin{seealso} +\seepep{327}{Decimal Data Type}{Written by Facundo Batista and implemented + by Facundo Batista, Eric Price, Raymond Hettinger, Aahz, and Tim Peters.} + +\seeurl{http://research.microsoft.com/\textasciitilde hollasch/cgindex/coding/ieeefloat.html} +{A more detailed overview of the IEEE-754 representation.} + +\seeurl{http://www.lahey.com/float.htm} +{The article uses Fortran code to illustrate many of the problems +that floating-point inaccuracy can cause.} + +\seeurl{http://www2.hursley.ibm.com/decimal/} +{A description of a decimal-based representation. This representation +is being proposed as a standard, and underlies the new Python decimal +type. Much of this material was written by Mike Cowlishaw, designer of the +Rexx language.} + +\end{seealso} + + +%====================================================================== +\section{PEP 328: Multi-line Imports} + +One language change is a small syntactic tweak aimed at making it +easier to import many names from a module. In a +\code{from \var{module} import \var{names}} statement, +\var{names} is a sequence of names separated by commas. If the sequence is +very long, you can either write multiple imports from the same module, +or you can use backslashes to escape the line endings like this: + +\begin{verbatim} +from SimpleXMLRPCServer import SimpleXMLRPCServer,\ + SimpleXMLRPCRequestHandler,\ + CGIXMLRPCRequestHandler,\ + resolve_dotted_attribute +\end{verbatim} + +The syntactic change in Python 2.4 simply allows putting the names +within parentheses. Python ignores newlines within a parenthesized +expression, so the backslashes are no longer needed: + +\begin{verbatim} +from SimpleXMLRPCServer import (SimpleXMLRPCServer, + SimpleXMLRPCRequestHandler, + CGIXMLRPCRequestHandler, + resolve_dotted_attribute) +\end{verbatim} + +The PEP also proposes that all \keyword{import} statements be absolute +imports, with a leading \samp{.} character to indicate a relative +import. This part of the PEP was not implemented for Python 2.4, +but was completed for Python 2.5. + +\begin{seealso} +\seepep{328}{Imports: Multi-Line and Absolute/Relative} + {Written by Aahz. Multi-line imports were implemented by + Dima Dorfman.} +\end{seealso} + + +%====================================================================== +\section{PEP 331: Locale-Independent Float/String Conversions} + +The \module{locale} modules lets Python software select various +conversions and display conventions that are localized to a particular +country or language. However, the module was careful to not change +the numeric locale because various functions in Python's +implementation required that the numeric locale remain set to the +\code{'C'} locale. Often this was because the code was using the C library's +\cfunction{atof()} function. + +Not setting the numeric locale caused trouble for extensions that used +third-party C libraries, however, because they wouldn't have the +correct locale set. The motivating example was GTK+, whose user +interface widgets weren't displaying numbers in the current locale. + +The solution described in the PEP is to add three new functions to the +Python API that perform ASCII-only conversions, ignoring the locale +setting: + +\begin{itemize} + \item \cfunction{PyOS_ascii_strtod(\var{str}, \var{ptr})} +and \cfunction{PyOS_ascii_atof(\var{str}, \var{ptr})} +both convert a string to a C \ctype{double}. + \item \cfunction{PyOS_ascii_formatd(\var{buffer}, \var{buf_len}, \var{format}, \var{d})} converts a \ctype{double} to an ASCII string. +\end{itemize} + +The code for these functions came from the GLib library +(\url{http://developer.gnome.org/arch/gtk/glib.html}), whose +developers kindly relicensed the relevant functions and donated them +to the Python Software Foundation. The \module{locale} module +can now change the numeric locale, letting extensions such as GTK+ +produce the correct results. + +\begin{seealso} +\seepep{331}{Locale-Independent Float/String Conversions} +{Written by Christian R. Reis, and implemented by Gustavo Carneiro.} +\end{seealso} + +%====================================================================== +\section{Other Language Changes} + +Here are all of the changes that Python 2.4 makes to the core Python +language. + +\begin{itemize} + +\item Decorators for functions and methods were added (\pep{318}). + +\item Built-in \function{set} and \function{frozenset} types were +added (\pep{218}). Other new built-ins include the \function{reversed(\var{seq})} function (\pep{322}). + +\item Generator expressions were added (\pep{289}). + +\item Certain numeric expressions no longer return values restricted to 32 or 64 bits (\pep{237}). + +\item You can now put parentheses around the list of names in a +\code{from \var{module} import \var{names}} statement (\pep{328}). + +\item The \method{dict.update()} method now accepts the same +argument forms as the \class{dict} constructor. This includes any +mapping, any iterable of key/value pairs, and keyword arguments. +(Contributed by Raymond Hettinger.) + +\item The string methods \method{ljust()}, \method{rjust()}, and +\method{center()} now take an optional argument for specifying a +fill character other than a space. +(Contributed by Raymond Hettinger.) + +\item Strings also gained an \method{rsplit()} method that +works like the \method{split()} method but splits from the end of +the string. +(Contributed by Sean Reifschneider.) + +\begin{verbatim} +>>> 'www.python.org'.split('.', 1) +['www', 'python.org'] +'www.python.org'.rsplit('.', 1) +['www.python', 'org'] +\end{verbatim} + +\item Three keyword parameters, \var{cmp}, \var{key}, and +\var{reverse}, were added to the \method{sort()} method of lists. +These parameters make some common usages of \method{sort()} simpler. +All of these parameters are optional. + +For the \var{cmp} parameter, the value should be a comparison function +that takes two parameters and returns -1, 0, or +1 depending on how +the parameters compare. This function will then be used to sort the +list. Previously this was the only parameter that could be provided +to \method{sort()}. + +\var{key} should be a single-parameter function that takes a list +element and returns a comparison key for the element. The list is +then sorted using the comparison keys. The following example sorts a +list case-insensitively: + +\begin{verbatim} +>>> L = ['A', 'b', 'c', 'D'] +>>> L.sort() # Case-sensitive sort +>>> L +['A', 'D', 'b', 'c'] +>>> # Using 'key' parameter to sort list +>>> L.sort(key=lambda x: x.lower()) +>>> L +['A', 'b', 'c', 'D'] +>>> # Old-fashioned way +>>> L.sort(cmp=lambda x,y: cmp(x.lower(), y.lower())) +>>> L +['A', 'b', 'c', 'D'] +\end{verbatim} + +The last example, which uses the \var{cmp} parameter, is the old way +to perform a case-insensitive sort. It works but is slower than using +a \var{key} parameter. Using \var{key} calls \method{lower()} method +once for each element in the list while using \var{cmp} will call it +twice for each comparison, so using \var{key} saves on invocations of +the \method{lower()} method. + +For simple key functions and comparison functions, it is often +possible to avoid a \keyword{lambda} expression by using an unbound +method instead. For example, the above case-insensitive sort is best +written as: + +\begin{verbatim} +>>> L.sort(key=str.lower) +>>> L +['A', 'b', 'c', 'D'] +\end{verbatim} + +Finally, the \var{reverse} parameter takes a Boolean value. If the +value is true, the list will be sorted into reverse order. +Instead of \code{L.sort() ; L.reverse()}, you can now write +\code{L.sort(reverse=True)}. + +The results of sorting are now guaranteed to be stable. This means +that two entries with equal keys will be returned in the same order as +they were input. For example, you can sort a list of people by name, +and then sort the list by age, resulting in a list sorted by age where +people with the same age are in name-sorted order. + +(All changes to \method{sort()} contributed by Raymond Hettinger.) + +\item There is a new built-in function +\function{sorted(\var{iterable})} that works like the in-place +\method{list.sort()} method but can be used in +expressions. The differences are: + \begin{itemize} + \item the input may be any iterable; + \item a newly formed copy is sorted, leaving the original intact; and + \item the expression returns the new sorted copy + \end{itemize} + +\begin{verbatim} +>>> L = [9,7,8,3,2,4,1,6,5] +>>> [10+i for i in sorted(L)] # usable in a list comprehension +[11, 12, 13, 14, 15, 16, 17, 18, 19] +>>> L # original is left unchanged +[9,7,8,3,2,4,1,6,5] +>>> sorted('Monty Python') # any iterable may be an input +[' ', 'M', 'P', 'h', 'n', 'n', 'o', 'o', 't', 't', 'y', 'y'] + +>>> # List the contents of a dict sorted by key values +>>> colormap = dict(red=1, blue=2, green=3, black=4, yellow=5) +>>> for k, v in sorted(colormap.iteritems()): +... print k, v +... +black 4 +blue 2 +green 3 +red 1 +yellow 5 +\end{verbatim} + +(Contributed by Raymond Hettinger.) + +\item Integer operations will no longer trigger an \exception{OverflowWarning}. +The \exception{OverflowWarning} warning will disappear in Python 2.5. + +\item The interpreter gained a new switch, \programopt{-m}, that +takes a name, searches for the corresponding module on \code{sys.path}, +and runs the module as a script. For example, +you can now run the Python profiler with \code{python -m profile}. +(Contributed by Nick Coghlan.) + +\item The \function{eval(\var{expr}, \var{globals}, \var{locals})} +and \function{execfile(\var{filename}, \var{globals}, \var{locals})} +functions and the \keyword{exec} statement now accept any mapping type +for the \var{locals} parameter. Previously this had to be a regular +Python dictionary. (Contributed by Raymond Hettinger.) + +\item The \function{zip()} built-in function and \function{itertools.izip()} + now return an empty list if called with no arguments. + Previously they raised a \exception{TypeError} + exception. This makes them more + suitable for use with variable length argument lists: + +\begin{verbatim} +>>> def transpose(array): +... return zip(*array) +... +>>> transpose([(1,2,3), (4,5,6)]) +[(1, 4), (2, 5), (3, 6)] +>>> transpose([]) +[] +\end{verbatim} +(Contributed by Raymond Hettinger.) + +\item Encountering a failure while importing a module no longer leaves +a partially-initialized module object in \code{sys.modules}. The +incomplete module object left behind would fool further imports of the +same module into succeeding, leading to confusing errors. +(Fixed by Tim Peters.) + +\item \constant{None} is now a constant; code that binds a new value to +the name \samp{None} is now a syntax error. +(Contributed by Raymond Hettinger.) + +\end{itemize} + + +%====================================================================== +\subsection{Optimizations} + +\begin{itemize} + +\item The inner loops for list and tuple slicing + were optimized and now run about one-third faster. The inner loops + for dictionaries were also optimized, resulting in performance boosts for + \method{keys()}, \method{values()}, \method{items()}, + \method{iterkeys()}, \method{itervalues()}, and \method{iteritems()}. + (Contributed by Raymond Hettinger.) + +\item The machinery for growing and shrinking lists was optimized for + speed and for space efficiency. Appending and popping from lists now + runs faster due to more efficient code paths and less frequent use of + the underlying system \cfunction{realloc()}. List comprehensions + also benefit. \method{list.extend()} was also optimized and no + longer converts its argument into a temporary list before extending + the base list. (Contributed by Raymond Hettinger.) + +\item \function{list()}, \function{tuple()}, \function{map()}, + \function{filter()}, and \function{zip()} now run several times + faster with non-sequence arguments that supply a \method{__len__()} + method. (Contributed by Raymond Hettinger.) + +\item The methods \method{list.__getitem__()}, + \method{dict.__getitem__()}, and \method{dict.__contains__()} are + are now implemented as \class{method_descriptor} objects rather + than \class{wrapper_descriptor} objects. This form of + access doubles their performance and makes them more suitable for + use as arguments to functionals: + \samp{map(mydict.__getitem__, keylist)}. + (Contributed by Raymond Hettinger.) + +\item Added a new opcode, \code{LIST_APPEND}, that simplifies + the generated bytecode for list comprehensions and speeds them up + by about a third. (Contributed by Raymond Hettinger.) + +\item The peephole bytecode optimizer has been improved to +produce shorter, faster bytecode; remarkably, the resulting bytecode is +more readable. (Enhanced by Raymond Hettinger.) + +\item String concatenations in statements of the form \code{s = s + +"abc"} and \code{s += "abc"} are now performed more efficiently in +certain circumstances. This optimization won't be present in other +Python implementations such as Jython, so you shouldn't rely on it; +using the \method{join()} method of strings is still recommended when +you want to efficiently glue a large number of strings together. +(Contributed by Armin Rigo.) + +\end{itemize} + +% pystone is almost useless for comparing different versions of Python; +% instead, it excels at predicting relative Python performance on +% different machines. +% So, this section would be more informative if it used other tools +% such as pybench and parrotbench. For a more application oriented +% benchmark, try comparing the timings of test_decimal.py under 2.3 +% and 2.4. + +The net result of the 2.4 optimizations is that Python 2.4 runs the +pystone benchmark around 5\% faster than Python 2.3 and 35\% faster +than Python 2.2. (pystone is not a particularly good benchmark, but +it's the most commonly used measurement of Python's performance. Your +own applications may show greater or smaller benefits from Python~2.4.) + + +%====================================================================== +\section{New, Improved, and Deprecated Modules} + +As usual, Python's standard library received a number of enhancements and +bug fixes. Here's a partial list of the most notable changes, sorted +alphabetically by module name. Consult the +\file{Misc/NEWS} file in the source tree for a more +complete list of changes, or look through the CVS logs for all the +details. + +\begin{itemize} + +\item The \module{asyncore} module's \function{loop()} function now + has a \var{count} parameter that lets you perform a limited number + of passes through the polling loop. The default is still to loop + forever. + +\item The \module{base64} module now has more complete RFC 3548 support + for Base64, Base32, and Base16 encoding and decoding, including + optional case folding and optional alternative alphabets. + (Contributed by Barry Warsaw.) + +\item The \module{bisect} module now has an underlying C implementation + for improved performance. + (Contributed by Dmitry Vasiliev.) + +\item The CJKCodecs collections of East Asian codecs, maintained +by Hye-Shik Chang, was integrated into 2.4. +The new encodings are: + +\begin{itemize} + \item Chinese (PRC): gb2312, gbk, gb18030, big5hkscs, hz + \item Chinese (ROC): big5, cp950 + \item Japanese: cp932, euc-jis-2004, euc-jp, +euc-jisx0213, iso-2022-jp, iso-2022-jp-1, iso-2022-jp-2, + iso-2022-jp-3, iso-2022-jp-ext, iso-2022-jp-2004, + shift-jis, shift-jisx0213, shift-jis-2004 + \item Korean: cp949, euc-kr, johab, iso-2022-kr +\end{itemize} + +\item Some other new encodings were added: HP Roman8, +ISO_8859-11, ISO_8859-16, PCTP-154, and TIS-620. + +\item The UTF-8 and UTF-16 codecs now cope better with receiving partial input. +Previously the \class{StreamReader} class would try to read more data, +making it impossible to resume decoding from the stream. The +\method{read()} method will now return as much data as it can and future +calls will resume decoding where previous ones left off. +(Implemented by Walter D\"orwald.) + +\item There is a new \module{collections} module for + various specialized collection datatypes. + Currently it contains just one type, \class{deque}, + a double-ended queue that supports efficiently adding and removing + elements from either end: + +\begin{verbatim} +>>> from collections import deque +>>> d = deque('ghi') # make a new deque with three items +>>> d.append('j') # add a new entry to the right side +>>> d.appendleft('f') # add a new entry to the left side +>>> d # show the representation of the deque +deque(['f', 'g', 'h', 'i', 'j']) +>>> d.pop() # return and remove the rightmost item +'j' +>>> d.popleft() # return and remove the leftmost item +'f' +>>> list(d) # list the contents of the deque +['g', 'h', 'i'] +>>> 'h' in d # search the deque +True +\end{verbatim} + +Several modules, such as the \module{Queue} and \module{threading} +modules, now take advantage of \class{collections.deque} for improved +performance. (Contributed by Raymond Hettinger.) + +\item The \module{ConfigParser} classes have been enhanced slightly. + The \method{read()} method now returns a list of the files that + were successfully parsed, and the \method{set()} method raises + \exception{TypeError} if passed a \var{value} argument that isn't a + string. (Contributed by John Belmonte and David Goodger.) + +\item The \module{curses} module now supports the ncurses extension + \function{use_default_colors()}. On platforms where the terminal + supports transparency, this makes it possible to use a transparent + background. (Contributed by J\"org Lehmann.) + +\item The \module{difflib} module now includes an \class{HtmlDiff} class +that creates an HTML table showing a side by side comparison +of two versions of a text. (Contributed by Dan Gass.) + +\item The \module{email} package was updated to version 3.0, +which dropped various deprecated APIs and removes support for Python +versions earlier than 2.3. The 3.0 version of the package uses a new +incremental parser for MIME messages, available in the +\module{email.FeedParser} module. The new parser doesn't require +reading the entire message into memory, and doesn't throw exceptions +if a message is malformed; instead it records any problems in the +\member{defect} attribute of the message. (Developed by Anthony +Baxter, Barry Warsaw, Thomas Wouters, and others.) + +\item The \module{heapq} module has been converted to C. The resulting + tenfold improvement in speed makes the module suitable for handling + high volumes of data. In addition, the module has two new functions + \function{nlargest()} and \function{nsmallest()} that use heaps to + find the N largest or smallest values in a dataset without the + expense of a full sort. (Contributed by Raymond Hettinger.) + +\item The \module{httplib} module now contains constants for HTTP +status codes defined in various HTTP-related RFC documents. Constants +have names such as \constant{OK}, \constant{CREATED}, +\constant{CONTINUE}, and \constant{MOVED_PERMANENTLY}; use pydoc to +get a full list. (Contributed by Andrew Eland.) + +\item The \module{imaplib} module now supports IMAP's THREAD command +(contributed by Yves Dionne) and new \method{deleteacl()} and +\method{myrights()} methods (contributed by Arnaud Mazin). + +\item The \module{itertools} module gained a + \function{groupby(\var{iterable}\optional{, \var{func}})} function. + \var{iterable} is something that can be iterated over to return a + stream of elements, and the optional \var{func} parameter is a + function that takes an element and returns a key value; if omitted, + the key is simply the element itself. \function{groupby()} then + groups the elements into subsequences which have matching values of + the key, and returns a series of 2-tuples containing the key value + and an iterator over the subsequence. + +Here's an example to make this clearer. The \var{key} function simply +returns whether a number is even or odd, so the result of +\function{groupby()} is to return consecutive runs of odd or even +numbers. + +\begin{verbatim} +>>> import itertools +>>> L = [2, 4, 6, 7, 8, 9, 11, 12, 14] +>>> for key_val, it in itertools.groupby(L, lambda x: x % 2): +... print key_val, list(it) +... +0 [2, 4, 6] +1 [7] +0 [8] +1 [9, 11] +0 [12, 14] +>>> +\end{verbatim} + +\function{groupby()} is typically used with sorted input. The logic +for \function{groupby()} is similar to the \UNIX{} \code{uniq} filter +which makes it handy for eliminating, counting, or identifying +duplicate elements: + +\begin{verbatim} +>>> word = 'abracadabra' +>>> letters = sorted(word) # Turn string into a sorted list of letters +>>> letters +['a', 'a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'r', 'r'] +>>> for k, g in itertools.groupby(letters): +... print k, list(g) +... +a ['a', 'a', 'a', 'a', 'a'] +b ['b', 'b'] +c ['c'] +d ['d'] +r ['r', 'r'] +>>> # List unique letters +>>> [k for k, g in groupby(letters)] +['a', 'b', 'c', 'd', 'r'] +>>> # Count letter occurrences +>>> [(k, len(list(g))) for k, g in groupby(letters)] +[('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)] +\end{verbatim} + +(Contributed by Hye-Shik Chang.) + +\item \module{itertools} also gained a function named +\function{tee(\var{iterator}, \var{N})} that returns \var{N} independent +iterators that replicate \var{iterator}. If \var{N} is omitted, the +default is 2. + +\begin{verbatim} +>>> L = [1,2,3] +>>> i1, i2 = itertools.tee(L) +>>> i1,i2 +(<itertools.tee object at 0x402c2080>, <itertools.tee object at 0x402c2090>) +>>> list(i1) # Run the first iterator to exhaustion +[1, 2, 3] +>>> list(i2) # Run the second iterator to exhaustion +[1, 2, 3] +>\end{verbatim} + +Note that \function{tee()} has to keep copies of the values returned +by the iterator; in the worst case, it may need to keep all of them. +This should therefore be used carefully if the leading iterator +can run far ahead of the trailing iterator in a long stream of inputs. +If the separation is large, then you might as well use +\function{list()} instead. When the iterators track closely with one +another, \function{tee()} is ideal. Possible applications include +bookmarking, windowing, or lookahead iterators. +(Contributed by Raymond Hettinger.) + +\item A number of functions were added to the \module{locale} +module, such as \function{bind_textdomain_codeset()} to specify a +particular encoding and a family of \function{l*gettext()} functions +that return messages in the chosen encoding. +(Contributed by Gustavo Niemeyer.) + +\item Some keyword arguments were added to the \module{logging} +package's \function{basicConfig} function to simplify log +configuration. The default behavior is to log messages to standard +error, but various keyword arguments can be specified to log to a +particular file, change the logging format, or set the logging level. +For example: + +\begin{verbatim} +import logging +logging.basicConfig(filename='/var/log/application.log', + level=0, # Log all messages + format='%(levelname):%(process):%(thread):%(message)') +\end{verbatim} + +Other additions to the \module{logging} package include a +\method{log(\var{level}, \var{msg})} convenience method, as well as a +\class{TimedRotatingFileHandler} class that rotates its log files at a +timed interval. The module already had \class{RotatingFileHandler}, +which rotated logs once the file exceeded a certain size. Both +classes derive from a new \class{BaseRotatingHandler} class that can +be used to implement other rotating handlers. + +(Changes implemented by Vinay Sajip.) + +\item The \module{marshal} module now shares interned strings on unpacking a +data structure. This may shrink the size of certain pickle strings, +but the primary effect is to make \file{.pyc} files significantly smaller. +(Contributed by Martin von~L\"owis.) + +\item The \module{nntplib} module's \class{NNTP} class gained +\method{description()} and \method{descriptions()} methods to retrieve +newsgroup descriptions for a single group or for a range of groups. +(Contributed by J\"urgen A. Erhard.) + +\item Two new functions were added to the \module{operator} module, +\function{attrgetter(\var{attr})} and \function{itemgetter(\var{index})}. +Both functions return callables that take a single argument and return +the corresponding attribute or item; these callables make excellent +data extractors when used with \function{map()} or +\function{sorted()}. For example: + +\begin{verbatim} +>>> L = [('c', 2), ('d', 1), ('a', 4), ('b', 3)] +>>> map(operator.itemgetter(0), L) +['c', 'd', 'a', 'b'] +>>> map(operator.itemgetter(1), L) +[2, 1, 4, 3] +>>> sorted(L, key=operator.itemgetter(1)) # Sort list by second tuple item +[('d', 1), ('c', 2), ('b', 3), ('a', 4)] +\end{verbatim} + +(Contributed by Raymond Hettinger.) + +\item The \module{optparse} module was updated in various ways. The +module now passes its messages through \function{gettext.gettext()}, +making it possible to internationalize Optik's help and error +messages. Help messages for options can now include the string +\code{'\%default'}, which will be replaced by the option's default +value. (Contributed by Greg Ward.) + +\item The long-term plan is to deprecate the \module{rfc822} module +in some future Python release in favor of the \module{email} package. +To this end, the \function{email.Utils.formatdate()} function has been +changed to make it usable as a replacement for +\function{rfc822.formatdate()}. You may want to write new e-mail +processing code with this in mind. (Change implemented by Anthony +Baxter.) + +\item A new \function{urandom(\var{n})} function was added to the +\module{os} module, returning a string containing \var{n} bytes of +random data. This function provides access to platform-specific +sources of randomness such as \file{/dev/urandom} on Linux or the +Windows CryptoAPI. (Contributed by Trevor Perrin.) + +\item Another new function: \function{os.path.lexists(\var{path})} +returns true if the file specified by \var{path} exists, whether or +not it's a symbolic link. This differs from the existing +\function{os.path.exists(\var{path})} function, which returns false if +\var{path} is a symlink that points to a destination that doesn't exist. +(Contributed by Beni Cherniavsky.) + +\item A new \function{getsid()} function was added to the +\module{posix} module that underlies the \module{os} module. +(Contributed by J. Raynor.) + +\item The \module{poplib} module now supports POP over SSL. (Contributed by +Hector Urtubia.) + +\item The \module{profile} module can now profile C extension functions. +(Contributed by Nick Bastin.) + +\item The \module{random} module has a new method called + \method{getrandbits(\var{N})} that returns a long integer \var{N} + bits in length. The existing \method{randrange()} method now uses + \method{getrandbits()} where appropriate, making generation of + arbitrarily large random numbers more efficient. (Contributed by + Raymond Hettinger.) + +\item The regular expression language accepted by the \module{re} module + was extended with simple conditional expressions, written as + \regexp{(?(\var{group})\var{A}|\var{B})}. \var{group} is either a + numeric group ID or a group name defined with \regexp{(?P<group>...)} + earlier in the expression. If the specified group matched, the + regular expression pattern \var{A} will be tested against the string; if + the group didn't match, the pattern \var{B} will be used instead. + (Contributed by Gustavo Niemeyer.) + +\item The \module{re} module is also no longer recursive, thanks to a +massive amount of work by Gustavo Niemeyer. In a recursive regular +expression engine, certain patterns result in a large amount of C +stack space being consumed, and it was possible to overflow the stack. +For example, if you matched a 30000-byte string of \samp{a} characters +against the expression \regexp{(a|b)+}, one stack frame was consumed +per character. Python 2.3 tried to check for stack overflow and raise +a \exception{RuntimeError} exception, but certain patterns could +sidestep the checking and if you were unlucky Python could segfault. +Python 2.4's regular expression engine can match this pattern without +problems. + +\item The \module{signal} module now performs tighter error-checking +on the parameters to the \function{signal.signal()} function. For +example, you can't set a handler on the \constant{SIGKILL} signal; +previous versions of Python would quietly accept this, but 2.4 will +raise a \exception{RuntimeError} exception. + +\item Two new functions were added to the \module{socket} module. +\function{socketpair()} returns a pair of connected sockets and +\function{getservbyport(\var{port})} looks up the service name for a +given port number. (Contributed by Dave Cole and Barry Warsaw.) + +\item The \function{sys.exitfunc()} function has been deprecated. Code +should be using the existing \module{atexit} module, which correctly +handles calling multiple exit functions. Eventually +\function{sys.exitfunc()} will become a purely internal interface, +accessed only by \module{atexit}. + +\item The \module{tarfile} module now generates GNU-format tar files +by default. (Contributed by Lars Gustaebel.) + +\item The \module{threading} module now has an elegantly simple way to support +thread-local data. The module contains a \class{local} class whose +attribute values are local to different threads. + +\begin{verbatim} +import threading + +data = threading.local() +data.number = 42 +data.url = ('www.python.org', 80) +\end{verbatim} + +Other threads can assign and retrieve their own values for the +\member{number} and \member{url} attributes. You can subclass +\class{local} to initialize attributes or to add methods. +(Contributed by Jim Fulton.) + +\item The \module{timeit} module now automatically disables periodic + garbage collection during the timing loop. This change makes + consecutive timings more comparable. (Contributed by Raymond Hettinger.) + +\item The \module{weakref} module now supports a wider variety of objects + including Python functions, class instances, sets, frozensets, deques, + arrays, files, sockets, and regular expression pattern objects. + (Contributed by Raymond Hettinger.) + +\item The \module{xmlrpclib} module now supports a multi-call extension for +transmitting multiple XML-RPC calls in a single HTTP operation. +(Contributed by Brian Quinlan.) + +\item The \module{mpz}, \module{rotor}, and \module{xreadlines} modules have +been removed. + +\end{itemize} + + +%====================================================================== +% whole new modules get described in subsections here + +%===================== +\subsection{cookielib} + +The \module{cookielib} library supports client-side handling for HTTP +cookies, mirroring the \module{Cookie} module's server-side cookie +support. Cookies are stored in cookie jars; the library transparently +stores cookies offered by the web server in the cookie jar, and +fetches the cookie from the jar when connecting to the server. As in +web browsers, policy objects control whether cookies are accepted or +not. + +In order to store cookies across sessions, two implementations of +cookie jars are provided: one that stores cookies in the Netscape +format so applications can use the Mozilla or Lynx cookie files, and +one that stores cookies in the same format as the Perl libwww library. + +\module{urllib2} has been changed to interact with \module{cookielib}: +\class{HTTPCookieProcessor} manages a cookie jar that is used when +accessing URLs. + +This module was contributed by John J. Lee. + + +% ================== +\subsection{doctest} + +The \module{doctest} module underwent considerable refactoring thanks +to Edward Loper and Tim Peters. Testing can still be as simple as +running \function{doctest.testmod()}, but the refactorings allow +customizing the module's operation in various ways + +The new \class{DocTestFinder} class extracts the tests from a given +object's docstrings: + +\begin{verbatim} +def f (x, y): + """>>> f(2,2) +4 +>>> f(3,2) +6 + """ + return x*y + +finder = doctest.DocTestFinder() + +# Get list of DocTest instances +tests = finder.find(f) +\end{verbatim} + +The new \class{DocTestRunner} class then runs individual tests and can +produce a summary of the results: + +\begin{verbatim} +runner = doctest.DocTestRunner() +for t in tests: + tried, failed = runner.run(t) + +runner.summarize(verbose=1) +\end{verbatim} + +The above example produces the following output: + +\begin{verbatim} +1 items passed all tests: + 2 tests in f +2 tests in 1 items. +2 passed and 0 failed. +Test passed. +\end{verbatim} + +\class{DocTestRunner} uses an instance of the \class{OutputChecker} +class to compare the expected output with the actual output. This +class takes a number of different flags that customize its behaviour; +ambitious users can also write a completely new subclass of +\class{OutputChecker}. + +The default output checker provides a number of handy features. +For example, with the \constant{doctest.ELLIPSIS} option flag, +an ellipsis (\samp{...}) in the expected output matches any substring, +making it easier to accommodate outputs that vary in minor ways: + +\begin{verbatim} +def o (n): + """>>> o(1) +<__main__.C instance at 0x...> +>>> +""" +\end{verbatim} + +Another special string, \samp{<BLANKLINE>}, matches a blank line: + +\begin{verbatim} +def p (n): + """>>> p(1) +<BLANKLINE> +>>> +""" +\end{verbatim} + +Another new capability is producing a diff-style display of the output +by specifying the \constant{doctest.REPORT_UDIFF} (unified diffs), +\constant{doctest.REPORT_CDIFF} (context diffs), or +\constant{doctest.REPORT_NDIFF} (delta-style) option flags. For example: + +\begin{verbatim} +def g (n): + """>>> g(4) +here +is +a +lengthy +>>>""" + L = 'here is a rather lengthy list of words'.split() + for word in L[:n]: + print word +\end{verbatim} + +Running the above function's tests with +\constant{doctest.REPORT_UDIFF} specified, you get the following output: + +\begin{verbatim} +********************************************************************** +File ``t.py'', line 15, in g +Failed example: + g(4) +Differences (unified diff with -expected +actual): + @@ -2,3 +2,3 @@ + is + a + -lengthy + +rather +********************************************************************** +\end{verbatim} + + +% ====================================================================== +\section{Build and C API Changes} + +Some of the changes to Python's build process and to the C API are: + +\begin{itemize} + + \item Three new convenience macros were added for common return + values from extension functions: \csimplemacro{Py_RETURN_NONE}, + \csimplemacro{Py_RETURN_TRUE}, and \csimplemacro{Py_RETURN_FALSE}. + (Contributed by Brett Cannon.) + + \item Another new macro, \csimplemacro{Py_CLEAR(\var{obj})}, + decreases the reference count of \var{obj} and sets \var{obj} to the + null pointer. (Contributed by Jim Fulton.) + + \item A new function, \cfunction{PyTuple_Pack(\var{N}, \var{obj1}, + \var{obj2}, ..., \var{objN})}, constructs tuples from a variable + length argument list of Python objects. (Contributed by Raymond Hettinger.) + + \item A new function, \cfunction{PyDict_Contains(\var{d}, \var{k})}, + implements fast dictionary lookups without masking exceptions raised + during the look-up process. (Contributed by Raymond Hettinger.) + + \item The \csimplemacro{Py_IS_NAN(\var{X})} macro returns 1 if + its float or double argument \var{X} is a NaN. + (Contributed by Tim Peters.) + + \item C code can avoid unnecessary locking by using the new + \cfunction{PyEval_ThreadsInitialized()} function to tell + if any thread operations have been performed. If this function + returns false, no lock operations are needed. + (Contributed by Nick Coghlan.) + + \item A new function, \cfunction{PyArg_VaParseTupleAndKeywords()}, + is the same as \cfunction{PyArg_ParseTupleAndKeywords()} but takes a + \ctype{va_list} instead of a number of arguments. + (Contributed by Greg Chapman.) + + \item A new method flag, \constant{METH_COEXISTS}, allows a function + defined in slots to co-exist with a \ctype{PyCFunction} having the + same name. This can halve the access time for a method such as + \method{set.__contains__()}. (Contributed by Raymond Hettinger.) + + \item Python can now be built with additional profiling for the + interpreter itself, intended as an aid to people developing the + Python core. Providing \longprogramopt{--enable-profiling} to the + \program{configure} script will let you profile the interpreter with + \program{gprof}, and providing the \longprogramopt{--with-tsc} + switch enables profiling using the Pentium's Time-Stamp-Counter + register. Note that the \longprogramopt{--with-tsc} switch is slightly + misnamed, because the profiling feature also works on the PowerPC + platform, though that processor architecture doesn't call that + register ``the TSC register''. (Contributed by Jeremy Hylton.) + + \item The \ctype{tracebackobject} type has been renamed to \ctype{PyTracebackObject}. + +\end{itemize} + + +%====================================================================== +\subsection{Port-Specific Changes} + +\begin{itemize} + +\item The Windows port now builds under MSVC++ 7.1 as well as version 6. + (Contributed by Martin von~L\"owis.) + +\end{itemize} + + + +%====================================================================== +\section{Porting to Python 2.4} + +This section lists previously described changes that may require +changes to your code: + +\begin{itemize} + +\item Left shifts and hexadecimal/octal constants that are too + large no longer trigger a \exception{FutureWarning} and return + a value limited to 32 or 64 bits; instead they return a long integer. + +\item Integer operations will no longer trigger an \exception{OverflowWarning}. +The \exception{OverflowWarning} warning will disappear in Python 2.5. + +\item The \function{zip()} built-in function and \function{itertools.izip()} + now return an empty list instead of raising a \exception{TypeError} + exception if called with no arguments. + +\item You can no longer compare the \class{date} and \class{datetime} + instances provided by the \module{datetime} module. Two + instances of different classes will now always be unequal, and + relative comparisons (\code{<}, \code{>}) will raise a \exception{TypeError}. + +\item \function{dircache.listdir()} now passes exceptions to the caller + instead of returning empty lists. + +\item \function{LexicalHandler.startDTD()} used to receive the public and + system IDs in the wrong order. This has been corrected; applications + relying on the wrong order need to be fixed. + +\item \function{fcntl.ioctl} now warns if the \var{mutate} + argument is omitted and relevant. + +\item The \module{tarfile} module now generates GNU-format tar files +by default. + +\item Encountering a failure while importing a module no longer leaves +a partially-initialized module object in \code{sys.modules}. + +\item \constant{None} is now a constant; code that binds a new value to +the name \samp{None} is now a syntax error. + +\item The \function{signals.signal()} function now raises a +\exception{RuntimeError} exception for certain illegal values; +previously these errors would pass silently. For example, you can no +longer set a handler on the \constant{SIGKILL} signal. + +\end{itemize} + + +%====================================================================== +\section{Acknowledgements \label{acks}} + +The author would like to thank the following people for offering +suggestions, corrections and assistance with various drafts of this +article: Koray Can, Hye-Shik Chang, Michael Dyck, Raymond Hettinger, +Brian Hurt, Hamish Lawson, Fredrik Lundh, Sean Reifschneider, +Sadruddin Rejeb. + +\end{document} diff --git a/sys/src/cmd/python/Doc/whatsnew/whatsnew25.tex b/sys/src/cmd/python/Doc/whatsnew/whatsnew25.tex new file mode 100644 index 000000000..9d78adcd8 --- /dev/null +++ b/sys/src/cmd/python/Doc/whatsnew/whatsnew25.tex @@ -0,0 +1,2530 @@ +\documentclass{howto} +\usepackage{distutils} +% $Id: whatsnew25.tex 54622 2007-03-30 17:58:16Z andrew.kuchling $ + +% Fix XXX comments + +\title{What's New in Python 2.5} +\release{1.01} +\author{A.M. Kuchling} +\authoraddress{\email{amk@amk.ca}} + +\begin{document} +\maketitle +\tableofcontents + +This article explains the new features in Python 2.5. The final +release of Python 2.5 is scheduled for August 2006; +\pep{356} describes the planned release schedule. + +The changes in Python 2.5 are an interesting mix of language and +library improvements. The library enhancements will be more important +to Python's user community, I think, because several widely-useful +packages were added. New modules include ElementTree for XML +processing (section~\ref{module-etree}), the SQLite database module +(section~\ref{module-sqlite}), and the \module{ctypes} module for +calling C functions (section~\ref{module-ctypes}). + +The language changes are of middling significance. Some pleasant new +features were added, but most of them aren't features that you'll use +every day. Conditional expressions were finally added to the language +using a novel syntax; see section~\ref{pep-308}. The new +'\keyword{with}' statement will make writing cleanup code easier +(section~\ref{pep-343}). Values can now be passed into generators +(section~\ref{pep-342}). Imports are now visible as either absolute +or relative (section~\ref{pep-328}). Some corner cases of exception +handling are handled better (section~\ref{pep-341}). All these +improvements are worthwhile, but they're improvements to one specific +language feature or another; none of them are broad modifications to +Python's semantics. + +As well as the language and library additions, other improvements and +bugfixes were made throughout the source tree. A search through the +SVN change logs finds there were 353 patches applied and 458 bugs +fixed between Python 2.4 and 2.5. (Both figures are likely to be +underestimates.) + +This article doesn't try to be a complete specification of the new +features; instead changes are briefly introduced using helpful +examples. For full details, you should always refer to the +documentation for Python 2.5 at \url{http://docs.python.org}. +If you want to understand the complete implementation and design +rationale, refer to the PEP for a particular new feature. + +Comments, suggestions, and error reports for this document are +welcome; please e-mail them to the author or open a bug in the Python +bug tracker. + +%====================================================================== +\section{PEP 308: Conditional Expressions\label{pep-308}} + +For a long time, people have been requesting a way to write +conditional expressions, which are expressions that return value A or +value B depending on whether a Boolean value is true or false. A +conditional expression lets you write a single assignment statement +that has the same effect as the following: + +\begin{verbatim} +if condition: + x = true_value +else: + x = false_value +\end{verbatim} + +There have been endless tedious discussions of syntax on both +python-dev and comp.lang.python. A vote was even held that found the +majority of voters wanted conditional expressions in some form, +but there was no syntax that was preferred by a clear majority. +Candidates included C's \code{cond ? true_v : false_v}, +\code{if cond then true_v else false_v}, and 16 other variations. + +Guido van~Rossum eventually chose a surprising syntax: + +\begin{verbatim} +x = true_value if condition else false_value +\end{verbatim} + +Evaluation is still lazy as in existing Boolean expressions, so the +order of evaluation jumps around a bit. The \var{condition} +expression in the middle is evaluated first, and the \var{true_value} +expression is evaluated only if the condition was true. Similarly, +the \var{false_value} expression is only evaluated when the condition +is false. + +This syntax may seem strange and backwards; why does the condition go +in the \emph{middle} of the expression, and not in the front as in C's +\code{c ? x : y}? The decision was checked by applying the new syntax +to the modules in the standard library and seeing how the resulting +code read. In many cases where a conditional expression is used, one +value seems to be the 'common case' and one value is an 'exceptional +case', used only on rarer occasions when the condition isn't met. The +conditional syntax makes this pattern a bit more obvious: + +\begin{verbatim} +contents = ((doc + '\n') if doc else '') +\end{verbatim} + +I read the above statement as meaning ``here \var{contents} is +usually assigned a value of \code{doc+'\e n'}; sometimes +\var{doc} is empty, in which special case an empty string is returned.'' +I doubt I will use conditional expressions very often where there +isn't a clear common and uncommon case. + +There was some discussion of whether the language should require +surrounding conditional expressions with parentheses. The decision +was made to \emph{not} require parentheses in the Python language's +grammar, but as a matter of style I think you should always use them. +Consider these two statements: + +\begin{verbatim} +# First version -- no parens +level = 1 if logging else 0 + +# Second version -- with parens +level = (1 if logging else 0) +\end{verbatim} + +In the first version, I think a reader's eye might group the statement +into 'level = 1', 'if logging', 'else 0', and think that the condition +decides whether the assignment to \var{level} is performed. The +second version reads better, in my opinion, because it makes it clear +that the assignment is always performed and the choice is being made +between two values. + +Another reason for including the brackets: a few odd combinations of +list comprehensions and lambdas could look like incorrect conditional +expressions. See \pep{308} for some examples. If you put parentheses +around your conditional expressions, you won't run into this case. + + +\begin{seealso} + +\seepep{308}{Conditional Expressions}{PEP written by +Guido van~Rossum and Raymond D. Hettinger; implemented by Thomas +Wouters.} + +\end{seealso} + + +%====================================================================== +\section{PEP 309: Partial Function Application\label{pep-309}} + +The \module{functools} module is intended to contain tools for +functional-style programming. + +One useful tool in this module is the \function{partial()} function. +For programs written in a functional style, you'll sometimes want to +construct variants of existing functions that have some of the +parameters filled in. Consider a Python function \code{f(a, b, c)}; +you could create a new function \code{g(b, c)} that was equivalent to +\code{f(1, b, c)}. This is called ``partial function application''. + +\function{partial} takes the arguments +\code{(\var{function}, \var{arg1}, \var{arg2}, ... +\var{kwarg1}=\var{value1}, \var{kwarg2}=\var{value2})}. The resulting +object is callable, so you can just call it to invoke \var{function} +with the filled-in arguments. + +Here's a small but realistic example: + +\begin{verbatim} +import functools + +def log (message, subsystem): + "Write the contents of 'message' to the specified subsystem." + print '%s: %s' % (subsystem, message) + ... + +server_log = functools.partial(log, subsystem='server') +server_log('Unable to open socket') +\end{verbatim} + +Here's another example, from a program that uses PyGTK. Here a +context-sensitive pop-up menu is being constructed dynamically. The +callback provided for the menu option is a partially applied version +of the \method{open_item()} method, where the first argument has been +provided. + +\begin{verbatim} +... +class Application: + def open_item(self, path): + ... + def init (self): + open_func = functools.partial(self.open_item, item_path) + popup_menu.append( ("Open", open_func, 1) ) +\end{verbatim} + + +Another function in the \module{functools} module is the +\function{update_wrapper(\var{wrapper}, \var{wrapped})} function that +helps you write well-behaved decorators. \function{update_wrapper()} +copies the name, module, and docstring attribute to a wrapper function +so that tracebacks inside the wrapped function are easier to +understand. For example, you might write: + +\begin{verbatim} +def my_decorator(f): + def wrapper(*args, **kwds): + print 'Calling decorated function' + return f(*args, **kwds) + functools.update_wrapper(wrapper, f) + return wrapper +\end{verbatim} + +\function{wraps()} is a decorator that can be used inside your own +decorators to copy the wrapped function's information. An alternate +version of the previous example would be: + +\begin{verbatim} +def my_decorator(f): + @functools.wraps(f) + def wrapper(*args, **kwds): + print 'Calling decorated function' + return f(*args, **kwds) + return wrapper +\end{verbatim} + +\begin{seealso} + +\seepep{309}{Partial Function Application}{PEP proposed and written by +Peter Harris; implemented by Hye-Shik Chang and Nick Coghlan, with +adaptations by Raymond Hettinger.} + +\end{seealso} + + +%====================================================================== +\section{PEP 314: Metadata for Python Software Packages v1.1\label{pep-314}} + +Some simple dependency support was added to Distutils. The +\function{setup()} function now has \code{requires}, \code{provides}, +and \code{obsoletes} keyword parameters. When you build a source +distribution using the \code{sdist} command, the dependency +information will be recorded in the \file{PKG-INFO} file. + +Another new keyword parameter is \code{download_url}, which should be +set to a URL for the package's source code. This means it's now +possible to look up an entry in the package index, determine the +dependencies for a package, and download the required packages. + +\begin{verbatim} +VERSION = '1.0' +setup(name='PyPackage', + version=VERSION, + requires=['numarray', 'zlib (>=1.1.4)'], + obsoletes=['OldPackage'] + download_url=('http://www.example.com/pypackage/dist/pkg-%s.tar.gz' + % VERSION), + ) +\end{verbatim} + +Another new enhancement to the Python package index at +\url{http://cheeseshop.python.org} is storing source and binary +archives for a package. The new \command{upload} Distutils command +will upload a package to the repository. + +Before a package can be uploaded, you must be able to build a +distribution using the \command{sdist} Distutils command. Once that +works, you can run \code{python setup.py upload} to add your package +to the PyPI archive. Optionally you can GPG-sign the package by +supplying the \longprogramopt{sign} and +\longprogramopt{identity} options. + +Package uploading was implemented by Martin von~L\"owis and Richard Jones. + +\begin{seealso} + +\seepep{314}{Metadata for Python Software Packages v1.1}{PEP proposed +and written by A.M. Kuchling, Richard Jones, and Fred Drake; +implemented by Richard Jones and Fred Drake.} + +\end{seealso} + + +%====================================================================== +\section{PEP 328: Absolute and Relative Imports\label{pep-328}} + +The simpler part of PEP 328 was implemented in Python 2.4: parentheses +could now be used to enclose the names imported from a module using +the \code{from ... import ...} statement, making it easier to import +many different names. + +The more complicated part has been implemented in Python 2.5: +importing a module can be specified to use absolute or +package-relative imports. The plan is to move toward making absolute +imports the default in future versions of Python. + +Let's say you have a package directory like this: +\begin{verbatim} +pkg/ +pkg/__init__.py +pkg/main.py +pkg/string.py +\end{verbatim} + +This defines a package named \module{pkg} containing the +\module{pkg.main} and \module{pkg.string} submodules. + +Consider the code in the \file{main.py} module. What happens if it +executes the statement \code{import string}? In Python 2.4 and +earlier, it will first look in the package's directory to perform a +relative import, finds \file{pkg/string.py}, imports the contents of +that file as the \module{pkg.string} module, and that module is bound +to the name \samp{string} in the \module{pkg.main} module's namespace. + +That's fine if \module{pkg.string} was what you wanted. But what if +you wanted Python's standard \module{string} module? There's no clean +way to ignore \module{pkg.string} and look for the standard module; +generally you had to look at the contents of \code{sys.modules}, which +is slightly unclean. +Holger Krekel's \module{py.std} package provides a tidier way to perform +imports from the standard library, \code{import py ; py.std.string.join()}, +but that package isn't available on all Python installations. + +Reading code which relies on relative imports is also less clear, +because a reader may be confused about which module, \module{string} +or \module{pkg.string}, is intended to be used. Python users soon +learned not to duplicate the names of standard library modules in the +names of their packages' submodules, but you can't protect against +having your submodule's name being used for a new module added in a +future version of Python. + +In Python 2.5, you can switch \keyword{import}'s behaviour to +absolute imports using a \code{from __future__ import absolute_import} +directive. This absolute-import behaviour will become the default in +a future version (probably Python 2.7). Once absolute imports +are the default, \code{import string} will +always find the standard library's version. +It's suggested that users should begin using absolute imports as much +as possible, so it's preferable to begin writing \code{from pkg import +string} in your code. + +Relative imports are still possible by adding a leading period +to the module name when using the \code{from ... import} form: + +\begin{verbatim} +# Import names from pkg.string +from .string import name1, name2 +# Import pkg.string +from . import string +\end{verbatim} + +This imports the \module{string} module relative to the current +package, so in \module{pkg.main} this will import \var{name1} and +\var{name2} from \module{pkg.string}. Additional leading periods +perform the relative import starting from the parent of the current +package. For example, code in the \module{A.B.C} module can do: + +\begin{verbatim} +from . import D # Imports A.B.D +from .. import E # Imports A.E +from ..F import G # Imports A.F.G +\end{verbatim} + +Leading periods cannot be used with the \code{import \var{modname}} +form of the import statement, only the \code{from ... import} form. + +\begin{seealso} + +\seepep{328}{Imports: Multi-Line and Absolute/Relative} +{PEP written by Aahz; implemented by Thomas Wouters.} + +\seeurl{http://codespeak.net/py/current/doc/index.html} +{The py library by Holger Krekel, which contains the \module{py.std} package.} + +\end{seealso} + + +%====================================================================== +\section{PEP 338: Executing Modules as Scripts\label{pep-338}} + +The \programopt{-m} switch added in Python 2.4 to execute a module as +a script gained a few more abilities. Instead of being implemented in +C code inside the Python interpreter, the switch now uses an +implementation in a new module, \module{runpy}. + +The \module{runpy} module implements a more sophisticated import +mechanism so that it's now possible to run modules in a package such +as \module{pychecker.checker}. The module also supports alternative +import mechanisms such as the \module{zipimport} module. This means +you can add a .zip archive's path to \code{sys.path} and then use the +\programopt{-m} switch to execute code from the archive. + + +\begin{seealso} + +\seepep{338}{Executing modules as scripts}{PEP written and +implemented by Nick Coghlan.} + +\end{seealso} + + +%====================================================================== +\section{PEP 341: Unified try/except/finally\label{pep-341}} + +Until Python 2.5, the \keyword{try} statement came in two +flavours. You could use a \keyword{finally} block to ensure that code +is always executed, or one or more \keyword{except} blocks to catch +specific exceptions. You couldn't combine both \keyword{except} blocks and a +\keyword{finally} block, because generating the right bytecode for the +combined version was complicated and it wasn't clear what the +semantics of the combined statement should be. + +Guido van~Rossum spent some time working with Java, which does support the +equivalent of combining \keyword{except} blocks and a +\keyword{finally} block, and this clarified what the statement should +mean. In Python 2.5, you can now write: + +\begin{verbatim} +try: + block-1 ... +except Exception1: + handler-1 ... +except Exception2: + handler-2 ... +else: + else-block +finally: + final-block +\end{verbatim} + +The code in \var{block-1} is executed. If the code raises an +exception, the various \keyword{except} blocks are tested: if the +exception is of class \class{Exception1}, \var{handler-1} is executed; +otherwise if it's of class \class{Exception2}, \var{handler-2} is +executed, and so forth. If no exception is raised, the +\var{else-block} is executed. + +No matter what happened previously, the \var{final-block} is executed +once the code block is complete and any raised exceptions handled. +Even if there's an error in an exception handler or the +\var{else-block} and a new exception is raised, the +code in the \var{final-block} is still run. + +\begin{seealso} + +\seepep{341}{Unifying try-except and try-finally}{PEP written by Georg Brandl; +implementation by Thomas Lee.} + +\end{seealso} + + +%====================================================================== +\section{PEP 342: New Generator Features\label{pep-342}} + +Python 2.5 adds a simple way to pass values \emph{into} a generator. +As introduced in Python 2.3, generators only produce output; once a +generator's code was invoked to create an iterator, there was no way to +pass any new information into the function when its execution is +resumed. Sometimes the ability to pass in some information would be +useful. Hackish solutions to this include making the generator's code +look at a global variable and then changing the global variable's +value, or passing in some mutable object that callers then modify. + +To refresh your memory of basic generators, here's a simple example: + +\begin{verbatim} +def counter (maximum): + i = 0 + while i < maximum: + yield i + i += 1 +\end{verbatim} + +When you call \code{counter(10)}, the result is an iterator that +returns the values from 0 up to 9. On encountering the +\keyword{yield} statement, the iterator returns the provided value and +suspends the function's execution, preserving the local variables. +Execution resumes on the following call to the iterator's +\method{next()} method, picking up after the \keyword{yield} statement. + +In Python 2.3, \keyword{yield} was a statement; it didn't return any +value. In 2.5, \keyword{yield} is now an expression, returning a +value that can be assigned to a variable or otherwise operated on: + +\begin{verbatim} +val = (yield i) +\end{verbatim} + +I recommend that you always put parentheses around a \keyword{yield} +expression when you're doing something with the returned value, as in +the above example. The parentheses aren't always necessary, but it's +easier to always add them instead of having to remember when they're +needed. + +(\pep{342} explains the exact rules, which are that a +\keyword{yield}-expression must always be parenthesized except when it +occurs at the top-level expression on the right-hand side of an +assignment. This means you can write \code{val = yield i} but have to +use parentheses when there's an operation, as in \code{val = (yield i) ++ 12}.) + +Values are sent into a generator by calling its +\method{send(\var{value})} method. The generator's code is then +resumed and the \keyword{yield} expression returns the specified +\var{value}. If the regular \method{next()} method is called, the +\keyword{yield} returns \constant{None}. + +Here's the previous example, modified to allow changing the value of +the internal counter. + +\begin{verbatim} +def counter (maximum): + i = 0 + while i < maximum: + val = (yield i) + # If value provided, change counter + if val is not None: + i = val + else: + i += 1 +\end{verbatim} + +And here's an example of changing the counter: + +\begin{verbatim} +>>> it = counter(10) +>>> print it.next() +0 +>>> print it.next() +1 +>>> print it.send(8) +8 +>>> print it.next() +9 +>>> print it.next() +Traceback (most recent call last): + File ``t.py'', line 15, in ? + print it.next() +StopIteration +\end{verbatim} + +\keyword{yield} will usually return \constant{None}, so you +should always check for this case. Don't just use its value in +expressions unless you're sure that the \method{send()} method +will be the only method used to resume your generator function. + +In addition to \method{send()}, there are two other new methods on +generators: + +\begin{itemize} + + \item \method{throw(\var{type}, \var{value}=None, + \var{traceback}=None)} is used to raise an exception inside the + generator; the exception is raised by the \keyword{yield} expression + where the generator's execution is paused. + + \item \method{close()} raises a new \exception{GeneratorExit} + exception inside the generator to terminate the iteration. On + receiving this exception, the generator's code must either raise + \exception{GeneratorExit} or \exception{StopIteration}. Catching + the \exception{GeneratorExit} exception and returning a value is + illegal and will trigger a \exception{RuntimeError}; if the function + raises some other exception, that exception is propagated to the + caller. \method{close()} will also be called by Python's garbage + collector when the generator is garbage-collected. + + If you need to run cleanup code when a \exception{GeneratorExit} occurs, + I suggest using a \code{try: ... finally:} suite instead of + catching \exception{GeneratorExit}. + +\end{itemize} + +The cumulative effect of these changes is to turn generators from +one-way producers of information into both producers and consumers. + +Generators also become \emph{coroutines}, a more generalized form of +subroutines. Subroutines are entered at one point and exited at +another point (the top of the function, and a \keyword{return} +statement), but coroutines can be entered, exited, and resumed at +many different points (the \keyword{yield} statements). We'll have to +figure out patterns for using coroutines effectively in Python. + +The addition of the \method{close()} method has one side effect that +isn't obvious. \method{close()} is called when a generator is +garbage-collected, so this means the generator's code gets one last +chance to run before the generator is destroyed. This last chance +means that \code{try...finally} statements in generators can now be +guaranteed to work; the \keyword{finally} clause will now always get a +chance to run. The syntactic restriction that you couldn't mix +\keyword{yield} statements with a \code{try...finally} suite has +therefore been removed. This seems like a minor bit of language +trivia, but using generators and \code{try...finally} is actually +necessary in order to implement the \keyword{with} statement +described by PEP 343. I'll look at this new statement in the following +section. + +Another even more esoteric effect of this change: previously, the +\member{gi_frame} attribute of a generator was always a frame object. +It's now possible for \member{gi_frame} to be \code{None} +once the generator has been exhausted. + +\begin{seealso} + +\seepep{342}{Coroutines via Enhanced Generators}{PEP written by +Guido van~Rossum and Phillip J. Eby; +implemented by Phillip J. Eby. Includes examples of +some fancier uses of generators as coroutines. + +Earlier versions of these features were proposed in +\pep{288} by Raymond Hettinger and \pep{325} by Samuele Pedroni. +} + +\seeurl{http://en.wikipedia.org/wiki/Coroutine}{The Wikipedia entry for +coroutines.} + +\seeurl{http://www.sidhe.org/\~{}dan/blog/archives/000178.html}{An +explanation of coroutines from a Perl point of view, written by Dan +Sugalski.} + +\end{seealso} + + +%====================================================================== +\section{PEP 343: The 'with' statement\label{pep-343}} + +The '\keyword{with}' statement clarifies code that previously would +use \code{try...finally} blocks to ensure that clean-up code is +executed. In this section, I'll discuss the statement as it will +commonly be used. In the next section, I'll examine the +implementation details and show how to write objects for use with this +statement. + +The '\keyword{with}' statement is a new control-flow structure whose +basic structure is: + +\begin{verbatim} +with expression [as variable]: + with-block +\end{verbatim} + +The expression is evaluated, and it should result in an object that +supports the context management protocol. This object may return a +value that can optionally be bound to the name \var{variable}. (Note +carefully that \var{variable} is \emph{not} assigned the result of +\var{expression}.) The object can then run set-up code +before \var{with-block} is executed and some clean-up code +is executed after the block is done, even if the block raised an exception. + +To enable the statement in Python 2.5, you need +to add the following directive to your module: + +\begin{verbatim} +from __future__ import with_statement +\end{verbatim} + +The statement will always be enabled in Python 2.6. + +Some standard Python objects now support the context management +protocol and can be used with the '\keyword{with}' statement. File +objects are one example: + +\begin{verbatim} +with open('/etc/passwd', 'r') as f: + for line in f: + print line + ... more processing code ... +\end{verbatim} + +After this statement has executed, the file object in \var{f} will +have been automatically closed, even if the 'for' loop +raised an exception part-way through the block. + +The \module{threading} module's locks and condition variables +also support the '\keyword{with}' statement: + +\begin{verbatim} +lock = threading.Lock() +with lock: + # Critical section of code + ... +\end{verbatim} + +The lock is acquired before the block is executed and always released once +the block is complete. + +The new \function{localcontext()} function in the \module{decimal} module +makes it easy to save and restore the current decimal context, which +encapsulates the desired precision and rounding characteristics for +computations: + +\begin{verbatim} +from decimal import Decimal, Context, localcontext + +# Displays with default precision of 28 digits +v = Decimal('578') +print v.sqrt() + +with localcontext(Context(prec=16)): + # All code in this block uses a precision of 16 digits. + # The original context is restored on exiting the block. + print v.sqrt() +\end{verbatim} + +\subsection{Writing Context Managers\label{context-managers}} + +Under the hood, the '\keyword{with}' statement is fairly complicated. +Most people will only use '\keyword{with}' in company with existing +objects and don't need to know these details, so you can skip the rest +of this section if you like. Authors of new objects will need to +understand the details of the underlying implementation and should +keep reading. + +A high-level explanation of the context management protocol is: + +\begin{itemize} + +\item The expression is evaluated and should result in an object +called a ``context manager''. The context manager must have +\method{__enter__()} and \method{__exit__()} methods. + +\item The context manager's \method{__enter__()} method is called. The value +returned is assigned to \var{VAR}. If no \code{'as \var{VAR}'} clause +is present, the value is simply discarded. + +\item The code in \var{BLOCK} is executed. + +\item If \var{BLOCK} raises an exception, the +\method{__exit__(\var{type}, \var{value}, \var{traceback})} is called +with the exception details, the same values returned by +\function{sys.exc_info()}. The method's return value controls whether +the exception is re-raised: any false value re-raises the exception, +and \code{True} will result in suppressing it. You'll only rarely +want to suppress the exception, because if you do +the author of the code containing the +'\keyword{with}' statement will never realize anything went wrong. + +\item If \var{BLOCK} didn't raise an exception, +the \method{__exit__()} method is still called, +but \var{type}, \var{value}, and \var{traceback} are all \code{None}. + +\end{itemize} + +Let's think through an example. I won't present detailed code but +will only sketch the methods necessary for a database that supports +transactions. + +(For people unfamiliar with database terminology: a set of changes to +the database are grouped into a transaction. Transactions can be +either committed, meaning that all the changes are written into the +database, or rolled back, meaning that the changes are all discarded +and the database is unchanged. See any database textbook for more +information.) + +Let's assume there's an object representing a database connection. +Our goal will be to let the user write code like this: + +\begin{verbatim} +db_connection = DatabaseConnection() +with db_connection as cursor: + cursor.execute('insert into ...') + cursor.execute('delete from ...') + # ... more operations ... +\end{verbatim} + +The transaction should be committed if the code in the block +runs flawlessly or rolled back if there's an exception. +Here's the basic interface +for \class{DatabaseConnection} that I'll assume: + +\begin{verbatim} +class DatabaseConnection: + # Database interface + def cursor (self): + "Returns a cursor object and starts a new transaction" + def commit (self): + "Commits current transaction" + def rollback (self): + "Rolls back current transaction" +\end{verbatim} + +The \method {__enter__()} method is pretty easy, having only to start +a new transaction. For this application the resulting cursor object +would be a useful result, so the method will return it. The user can +then add \code{as cursor} to their '\keyword{with}' statement to bind +the cursor to a variable name. + +\begin{verbatim} +class DatabaseConnection: + ... + def __enter__ (self): + # Code to start a new transaction + cursor = self.cursor() + return cursor +\end{verbatim} + +The \method{__exit__()} method is the most complicated because it's +where most of the work has to be done. The method has to check if an +exception occurred. If there was no exception, the transaction is +committed. The transaction is rolled back if there was an exception. + +In the code below, execution will just fall off the end of the +function, returning the default value of \code{None}. \code{None} is +false, so the exception will be re-raised automatically. If you +wished, you could be more explicit and add a \keyword{return} +statement at the marked location. + +\begin{verbatim} +class DatabaseConnection: + ... + def __exit__ (self, type, value, tb): + if tb is None: + # No exception, so commit + self.commit() + else: + # Exception occurred, so rollback. + self.rollback() + # return False +\end{verbatim} + + +\subsection{The contextlib module\label{module-contextlib}} + +The new \module{contextlib} module provides some functions and a +decorator that are useful for writing objects for use with the +'\keyword{with}' statement. + +The decorator is called \function{contextmanager}, and lets you write +a single generator function instead of defining a new class. The generator +should yield exactly one value. The code up to the \keyword{yield} +will be executed as the \method{__enter__()} method, and the value +yielded will be the method's return value that will get bound to the +variable in the '\keyword{with}' statement's \keyword{as} clause, if +any. The code after the \keyword{yield} will be executed in the +\method{__exit__()} method. Any exception raised in the block will be +raised by the \keyword{yield} statement. + +Our database example from the previous section could be written +using this decorator as: + +\begin{verbatim} +from contextlib import contextmanager + +@contextmanager +def db_transaction (connection): + cursor = connection.cursor() + try: + yield cursor + except: + connection.rollback() + raise + else: + connection.commit() + +db = DatabaseConnection() +with db_transaction(db) as cursor: + ... +\end{verbatim} + +The \module{contextlib} module also has a \function{nested(\var{mgr1}, +\var{mgr2}, ...)} function that combines a number of context managers so you +don't need to write nested '\keyword{with}' statements. In this +example, the single '\keyword{with}' statement both starts a database +transaction and acquires a thread lock: + +\begin{verbatim} +lock = threading.Lock() +with nested (db_transaction(db), lock) as (cursor, locked): + ... +\end{verbatim} + +Finally, the \function{closing(\var{object})} function +returns \var{object} so that it can be bound to a variable, +and calls \code{\var{object}.close()} at the end of the block. + +\begin{verbatim} +import urllib, sys +from contextlib import closing + +with closing(urllib.urlopen('http://www.yahoo.com')) as f: + for line in f: + sys.stdout.write(line) +\end{verbatim} + +\begin{seealso} + +\seepep{343}{The ``with'' statement}{PEP written by Guido van~Rossum +and Nick Coghlan; implemented by Mike Bland, Guido van~Rossum, and +Neal Norwitz. The PEP shows the code generated for a '\keyword{with}' +statement, which can be helpful in learning how the statement works.} + +\seeurl{../lib/module-contextlib.html}{The documentation +for the \module{contextlib} module.} + +\end{seealso} + + +%====================================================================== +\section{PEP 352: Exceptions as New-Style Classes\label{pep-352}} + +Exception classes can now be new-style classes, not just classic +classes, and the built-in \exception{Exception} class and all the +standard built-in exceptions (\exception{NameError}, +\exception{ValueError}, etc.) are now new-style classes. + +The inheritance hierarchy for exceptions has been rearranged a bit. +In 2.5, the inheritance relationships are: + +\begin{verbatim} +BaseException # New in Python 2.5 +|- KeyboardInterrupt +|- SystemExit +|- Exception + |- (all other current built-in exceptions) +\end{verbatim} + +This rearrangement was done because people often want to catch all +exceptions that indicate program errors. \exception{KeyboardInterrupt} and +\exception{SystemExit} aren't errors, though, and usually represent an explicit +action such as the user hitting Control-C or code calling +\function{sys.exit()}. A bare \code{except:} will catch all exceptions, +so you commonly need to list \exception{KeyboardInterrupt} and +\exception{SystemExit} in order to re-raise them. The usual pattern is: + +\begin{verbatim} +try: + ... +except (KeyboardInterrupt, SystemExit): + raise +except: + # Log error... + # Continue running program... +\end{verbatim} + +In Python 2.5, you can now write \code{except Exception} to achieve +the same result, catching all the exceptions that usually indicate errors +but leaving \exception{KeyboardInterrupt} and +\exception{SystemExit} alone. As in previous versions, +a bare \code{except:} still catches all exceptions. + +The goal for Python 3.0 is to require any class raised as an exception +to derive from \exception{BaseException} or some descendant of +\exception{BaseException}, and future releases in the +Python 2.x series may begin to enforce this constraint. Therefore, I +suggest you begin making all your exception classes derive from +\exception{Exception} now. It's been suggested that the bare +\code{except:} form should be removed in Python 3.0, but Guido van~Rossum +hasn't decided whether to do this or not. + +Raising of strings as exceptions, as in the statement \code{raise +"Error occurred"}, is deprecated in Python 2.5 and will trigger a +warning. The aim is to be able to remove the string-exception feature +in a few releases. + + +\begin{seealso} + +\seepep{352}{Required Superclass for Exceptions}{PEP written by +Brett Cannon and Guido van~Rossum; implemented by Brett Cannon.} + +\end{seealso} + + +%====================================================================== +\section{PEP 353: Using ssize_t as the index type\label{pep-353}} + +A wide-ranging change to Python's C API, using a new +\ctype{Py_ssize_t} type definition instead of \ctype{int}, +will permit the interpreter to handle more data on 64-bit platforms. +This change doesn't affect Python's capacity on 32-bit platforms. + +Various pieces of the Python interpreter used C's \ctype{int} type to +store sizes or counts; for example, the number of items in a list or +tuple were stored in an \ctype{int}. The C compilers for most 64-bit +platforms still define \ctype{int} as a 32-bit type, so that meant +that lists could only hold up to \code{2**31 - 1} = 2147483647 items. +(There are actually a few different programming models that 64-bit C +compilers can use -- see +\url{http://www.unix.org/version2/whatsnew/lp64_wp.html} for a +discussion -- but the most commonly available model leaves \ctype{int} +as 32 bits.) + +A limit of 2147483647 items doesn't really matter on a 32-bit platform +because you'll run out of memory before hitting the length limit. +Each list item requires space for a pointer, which is 4 bytes, plus +space for a \ctype{PyObject} representing the item. 2147483647*4 is +already more bytes than a 32-bit address space can contain. + +It's possible to address that much memory on a 64-bit platform, +however. The pointers for a list that size would only require 16~GiB +of space, so it's not unreasonable that Python programmers might +construct lists that large. Therefore, the Python interpreter had to +be changed to use some type other than \ctype{int}, and this will be a +64-bit type on 64-bit platforms. The change will cause +incompatibilities on 64-bit machines, so it was deemed worth making +the transition now, while the number of 64-bit users is still +relatively small. (In 5 or 10 years, we may \emph{all} be on 64-bit +machines, and the transition would be more painful then.) + +This change most strongly affects authors of C extension modules. +Python strings and container types such as lists and tuples +now use \ctype{Py_ssize_t} to store their size. +Functions such as \cfunction{PyList_Size()} +now return \ctype{Py_ssize_t}. Code in extension modules +may therefore need to have some variables changed to +\ctype{Py_ssize_t}. + +The \cfunction{PyArg_ParseTuple()} and \cfunction{Py_BuildValue()} functions +have a new conversion code, \samp{n}, for \ctype{Py_ssize_t}. +\cfunction{PyArg_ParseTuple()}'s \samp{s\#} and \samp{t\#} still output +\ctype{int} by default, but you can define the macro +\csimplemacro{PY_SSIZE_T_CLEAN} before including \file{Python.h} +to make them return \ctype{Py_ssize_t}. + +\pep{353} has a section on conversion guidelines that +extension authors should read to learn about supporting 64-bit +platforms. + +\begin{seealso} + +\seepep{353}{Using ssize_t as the index type}{PEP written and implemented by Martin von~L\"owis.} + +\end{seealso} + + +%====================================================================== +\section{PEP 357: The '__index__' method\label{pep-357}} + +The NumPy developers had a problem that could only be solved by adding +a new special method, \method{__index__}. When using slice notation, +as in \code{[\var{start}:\var{stop}:\var{step}]}, the values of the +\var{start}, \var{stop}, and \var{step} indexes must all be either +integers or long integers. NumPy defines a variety of specialized +integer types corresponding to unsigned and signed integers of 8, 16, +32, and 64 bits, but there was no way to signal that these types could +be used as slice indexes. + +Slicing can't just use the existing \method{__int__} method because +that method is also used to implement coercion to integers. If +slicing used \method{__int__}, floating-point numbers would also +become legal slice indexes and that's clearly an undesirable +behaviour. + +Instead, a new special method called \method{__index__} was added. It +takes no arguments and returns an integer giving the slice index to +use. For example: + +\begin{verbatim} +class C: + def __index__ (self): + return self.value +\end{verbatim} + +The return value must be either a Python integer or long integer. +The interpreter will check that the type returned is correct, and +raises a \exception{TypeError} if this requirement isn't met. + +A corresponding \member{nb_index} slot was added to the C-level +\ctype{PyNumberMethods} structure to let C extensions implement this +protocol. \cfunction{PyNumber_Index(\var{obj})} can be used in +extension code to call the \method{__index__} function and retrieve +its result. + +\begin{seealso} + +\seepep{357}{Allowing Any Object to be Used for Slicing}{PEP written +and implemented by Travis Oliphant.} + +\end{seealso} + + +%====================================================================== +\section{Other Language Changes\label{other-lang}} + +Here are all of the changes that Python 2.5 makes to the core Python +language. + +\begin{itemize} + +\item The \class{dict} type has a new hook for letting subclasses +provide a default value when a key isn't contained in the dictionary. +When a key isn't found, the dictionary's +\method{__missing__(\var{key})} +method will be called. This hook is used to implement +the new \class{defaultdict} class in the \module{collections} +module. The following example defines a dictionary +that returns zero for any missing key: + +\begin{verbatim} +class zerodict (dict): + def __missing__ (self, key): + return 0 + +d = zerodict({1:1, 2:2}) +print d[1], d[2] # Prints 1, 2 +print d[3], d[4] # Prints 0, 0 +\end{verbatim} + +\item Both 8-bit and Unicode strings have new \method{partition(sep)} +and \method{rpartition(sep)} methods that simplify a common use case. + +The \method{find(S)} method is often used to get an index which is +then used to slice the string and obtain the pieces that are before +and after the separator. +\method{partition(sep)} condenses this +pattern into a single method call that returns a 3-tuple containing +the substring before the separator, the separator itself, and the +substring after the separator. If the separator isn't found, the +first element of the tuple is the entire string and the other two +elements are empty. \method{rpartition(sep)} also returns a 3-tuple +but starts searching from the end of the string; the \samp{r} stands +for 'reverse'. + +Some examples: + +\begin{verbatim} +>>> ('http://www.python.org').partition('://') +('http', '://', 'www.python.org') +>>> ('file:/usr/share/doc/index.html').partition('://') +('file:/usr/share/doc/index.html', '', '') +>>> (u'Subject: a quick question').partition(':') +(u'Subject', u':', u' a quick question') +>>> 'www.python.org'.rpartition('.') +('www.python', '.', 'org') +>>> 'www.python.org'.rpartition(':') +('', '', 'www.python.org') +\end{verbatim} + +(Implemented by Fredrik Lundh following a suggestion by Raymond Hettinger.) + +\item The \method{startswith()} and \method{endswith()} methods +of string types now accept tuples of strings to check for. + +\begin{verbatim} +def is_image_file (filename): + return filename.endswith(('.gif', '.jpg', '.tiff')) +\end{verbatim} + +(Implemented by Georg Brandl following a suggestion by Tom Lynn.) +% RFE #1491485 + +\item The \function{min()} and \function{max()} built-in functions +gained a \code{key} keyword parameter analogous to the \code{key} +argument for \method{sort()}. This parameter supplies a function that +takes a single argument and is called for every value in the list; +\function{min()}/\function{max()} will return the element with the +smallest/largest return value from this function. +For example, to find the longest string in a list, you can do: + +\begin{verbatim} +L = ['medium', 'longest', 'short'] +# Prints 'longest' +print max(L, key=len) +# Prints 'short', because lexicographically 'short' has the largest value +print max(L) +\end{verbatim} + +(Contributed by Steven Bethard and Raymond Hettinger.) + +\item Two new built-in functions, \function{any()} and +\function{all()}, evaluate whether an iterator contains any true or +false values. \function{any()} returns \constant{True} if any value +returned by the iterator is true; otherwise it will return +\constant{False}. \function{all()} returns \constant{True} only if +all of the values returned by the iterator evaluate as true. +(Suggested by Guido van~Rossum, and implemented by Raymond Hettinger.) + +\item The result of a class's \method{__hash__()} method can now +be either a long integer or a regular integer. If a long integer is +returned, the hash of that value is taken. In earlier versions the +hash value was required to be a regular integer, but in 2.5 the +\function{id()} built-in was changed to always return non-negative +numbers, and users often seem to use \code{id(self)} in +\method{__hash__()} methods (though this is discouraged). +% Bug #1536021 + +\item ASCII is now the default encoding for modules. It's now +a syntax error if a module contains string literals with 8-bit +characters but doesn't have an encoding declaration. In Python 2.4 +this triggered a warning, not a syntax error. See \pep{263} +for how to declare a module's encoding; for example, you might add +a line like this near the top of the source file: + +\begin{verbatim} +# -*- coding: latin1 -*- +\end{verbatim} + +\item A new warning, \class{UnicodeWarning}, is triggered when +you attempt to compare a Unicode string and an 8-bit string +that can't be converted to Unicode using the default ASCII encoding. +The result of the comparison is false: + +\begin{verbatim} +>>> chr(128) == unichr(128) # Can't convert chr(128) to Unicode +__main__:1: UnicodeWarning: Unicode equal comparison failed + to convert both arguments to Unicode - interpreting them + as being unequal +False +>>> chr(127) == unichr(127) # chr(127) can be converted +True +\end{verbatim} + +Previously this would raise a \class{UnicodeDecodeError} exception, +but in 2.5 this could result in puzzling problems when accessing a +dictionary. If you looked up \code{unichr(128)} and \code{chr(128)} +was being used as a key, you'd get a \class{UnicodeDecodeError} +exception. Other changes in 2.5 resulted in this exception being +raised instead of suppressed by the code in \file{dictobject.c} that +implements dictionaries. + +Raising an exception for such a comparison is strictly correct, but +the change might have broken code, so instead +\class{UnicodeWarning} was introduced. + +(Implemented by Marc-Andr\'e Lemburg.) + +\item One error that Python programmers sometimes make is forgetting +to include an \file{__init__.py} module in a package directory. +Debugging this mistake can be confusing, and usually requires running +Python with the \programopt{-v} switch to log all the paths searched. +In Python 2.5, a new \exception{ImportWarning} warning is triggered when +an import would have picked up a directory as a package but no +\file{__init__.py} was found. This warning is silently ignored by default; +provide the \programopt{-Wd} option when running the Python executable +to display the warning message. +(Implemented by Thomas Wouters.) + +\item The list of base classes in a class definition can now be empty. +As an example, this is now legal: + +\begin{verbatim} +class C(): + pass +\end{verbatim} +(Implemented by Brett Cannon.) + +\end{itemize} + + +%====================================================================== +\subsection{Interactive Interpreter Changes\label{interactive}} + +In the interactive interpreter, \code{quit} and \code{exit} +have long been strings so that new users get a somewhat helpful message +when they try to quit: + +\begin{verbatim} +>>> quit +'Use Ctrl-D (i.e. EOF) to exit.' +\end{verbatim} + +In Python 2.5, \code{quit} and \code{exit} are now objects that still +produce string representations of themselves, but are also callable. +Newbies who try \code{quit()} or \code{exit()} will now exit the +interpreter as they expect. (Implemented by Georg Brandl.) + +The Python executable now accepts the standard long options +\longprogramopt{help} and \longprogramopt{version}; on Windows, +it also accepts the \programopt{/?} option for displaying a help message. +(Implemented by Georg Brandl.) + + +%====================================================================== +\subsection{Optimizations\label{opts}} + +Several of the optimizations were developed at the NeedForSpeed +sprint, an event held in Reykjavik, Iceland, from May 21--28 2006. +The sprint focused on speed enhancements to the CPython implementation +and was funded by EWT LLC with local support from CCP Games. Those +optimizations added at this sprint are specially marked in the +following list. + +\begin{itemize} + +\item When they were introduced +in Python 2.4, the built-in \class{set} and \class{frozenset} types +were built on top of Python's dictionary type. +In 2.5 the internal data structure has been customized for implementing sets, +and as a result sets will use a third less memory and are somewhat faster. +(Implemented by Raymond Hettinger.) + +\item The speed of some Unicode operations, such as finding +substrings, string splitting, and character map encoding and decoding, +has been improved. (Substring search and splitting improvements were +added by Fredrik Lundh and Andrew Dalke at the NeedForSpeed +sprint. Character maps were improved by Walter D\"orwald and +Martin von~L\"owis.) +% Patch 1313939, 1359618 + +\item The \function{long(\var{str}, \var{base})} function is now +faster on long digit strings because fewer intermediate results are +calculated. The peak is for strings of around 800--1000 digits where +the function is 6 times faster. +(Contributed by Alan McIntyre and committed at the NeedForSpeed sprint.) +% Patch 1442927 + +\item It's now illegal to mix iterating over a file +with \code{for line in \var{file}} and calling +the file object's \method{read()}/\method{readline()}/\method{readlines()} +methods. Iteration uses an internal buffer and the +\method{read*()} methods don't use that buffer. +Instead they would return the data following the buffer, causing the +data to appear out of order. Mixing iteration and these methods will +now trigger a \exception{ValueError} from the \method{read*()} method. +(Implemented by Thomas Wouters.) +% Patch 1397960 + +\item The \module{struct} module now compiles structure format +strings into an internal representation and caches this +representation, yielding a 20\% speedup. (Contributed by Bob Ippolito +at the NeedForSpeed sprint.) + +\item The \module{re} module got a 1 or 2\% speedup by switching to +Python's allocator functions instead of the system's +\cfunction{malloc()} and \cfunction{free()}. +(Contributed by Jack Diederich at the NeedForSpeed sprint.) + +\item The code generator's peephole optimizer now performs +simple constant folding in expressions. If you write something like +\code{a = 2+3}, the code generator will do the arithmetic and produce +code corresponding to \code{a = 5}. (Proposed and implemented +by Raymond Hettinger.) + +\item Function calls are now faster because code objects now keep +the most recently finished frame (a ``zombie frame'') in an internal +field of the code object, reusing it the next time the code object is +invoked. (Original patch by Michael Hudson, modified by Armin Rigo +and Richard Jones; committed at the NeedForSpeed sprint.) +% Patch 876206 + +Frame objects are also slightly smaller, which may improve cache locality +and reduce memory usage a bit. (Contributed by Neal Norwitz.) +% Patch 1337051 + +\item Python's built-in exceptions are now new-style classes, a change +that speeds up instantiation considerably. Exception handling in +Python 2.5 is therefore about 30\% faster than in 2.4. +(Contributed by Richard Jones, Georg Brandl and Sean Reifschneider at +the NeedForSpeed sprint.) + +\item Importing now caches the paths tried, recording whether +they exist or not so that the interpreter makes fewer +\cfunction{open()} and \cfunction{stat()} calls on startup. +(Contributed by Martin von~L\"owis and Georg Brandl.) +% Patch 921466 + +\end{itemize} + + +%====================================================================== +\section{New, Improved, and Removed Modules\label{modules}} + +The standard library received many enhancements and bug fixes in +Python 2.5. Here's a partial list of the most notable changes, sorted +alphabetically by module name. Consult the \file{Misc/NEWS} file in +the source tree for a more complete list of changes, or look through +the SVN logs for all the details. + +\begin{itemize} + +\item The \module{audioop} module now supports the a-LAW encoding, +and the code for u-LAW encoding has been improved. (Contributed by +Lars Immisch.) + +\item The \module{codecs} module gained support for incremental +codecs. The \function{codec.lookup()} function now +returns a \class{CodecInfo} instance instead of a tuple. +\class{CodecInfo} instances behave like a 4-tuple to preserve backward +compatibility but also have the attributes \member{encode}, +\member{decode}, \member{incrementalencoder}, \member{incrementaldecoder}, +\member{streamwriter}, and \member{streamreader}. Incremental codecs +can receive input and produce output in multiple chunks; the output is +the same as if the entire input was fed to the non-incremental codec. +See the \module{codecs} module documentation for details. +(Designed and implemented by Walter D\"orwald.) +% Patch 1436130 + +\item The \module{collections} module gained a new type, +\class{defaultdict}, that subclasses the standard \class{dict} +type. The new type mostly behaves like a dictionary but constructs a +default value when a key isn't present, automatically adding it to the +dictionary for the requested key value. + +The first argument to \class{defaultdict}'s constructor is a factory +function that gets called whenever a key is requested but not found. +This factory function receives no arguments, so you can use built-in +type constructors such as \function{list()} or \function{int()}. For +example, +you can make an index of words based on their initial letter like this: + +\begin{verbatim} +words = """Nel mezzo del cammin di nostra vita +mi ritrovai per una selva oscura +che la diritta via era smarrita""".lower().split() + +index = defaultdict(list) + +for w in words: + init_letter = w[0] + index[init_letter].append(w) +\end{verbatim} + +Printing \code{index} results in the following output: + +\begin{verbatim} +defaultdict(<type 'list'>, {'c': ['cammin', 'che'], 'e': ['era'], + 'd': ['del', 'di', 'diritta'], 'm': ['mezzo', 'mi'], + 'l': ['la'], 'o': ['oscura'], 'n': ['nel', 'nostra'], + 'p': ['per'], 's': ['selva', 'smarrita'], + 'r': ['ritrovai'], 'u': ['una'], 'v': ['vita', 'via']} +\end{verbatim} + +(Contributed by Guido van~Rossum.) + +\item The \class{deque} double-ended queue type supplied by the +\module{collections} module now has a \method{remove(\var{value})} +method that removes the first occurrence of \var{value} in the queue, +raising \exception{ValueError} if the value isn't found. +(Contributed by Raymond Hettinger.) + +\item New module: The \module{contextlib} module contains helper functions for use +with the new '\keyword{with}' statement. See +section~\ref{module-contextlib} for more about this module. + +\item New module: The \module{cProfile} module is a C implementation of +the existing \module{profile} module that has much lower overhead. +The module's interface is the same as \module{profile}: you run +\code{cProfile.run('main()')} to profile a function, can save profile +data to a file, etc. It's not yet known if the Hotshot profiler, +which is also written in C but doesn't match the \module{profile} +module's interface, will continue to be maintained in future versions +of Python. (Contributed by Armin Rigo.) + +Also, the \module{pstats} module for analyzing the data measured by +the profiler now supports directing the output to any file object +by supplying a \var{stream} argument to the \class{Stats} constructor. +(Contributed by Skip Montanaro.) + +\item The \module{csv} module, which parses files in +comma-separated value format, received several enhancements and a +number of bugfixes. You can now set the maximum size in bytes of a +field by calling the \method{csv.field_size_limit(\var{new_limit})} +function; omitting the \var{new_limit} argument will return the +currently-set limit. The \class{reader} class now has a +\member{line_num} attribute that counts the number of physical lines +read from the source; records can span multiple physical lines, so +\member{line_num} is not the same as the number of records read. + +The CSV parser is now stricter about multi-line quoted +fields. Previously, if a line ended within a quoted field without a +terminating newline character, a newline would be inserted into the +returned field. This behavior caused problems when reading files that +contained carriage return characters within fields, so the code was +changed to return the field without inserting newlines. As a +consequence, if newlines embedded within fields are important, the +input should be split into lines in a manner that preserves the +newline characters. + +(Contributed by Skip Montanaro and Andrew McNamara.) + +\item The \class{datetime} class in the \module{datetime} +module now has a \method{strptime(\var{string}, \var{format})} +method for parsing date strings, contributed by Josh Spoerri. +It uses the same format characters as \function{time.strptime()} and +\function{time.strftime()}: + +\begin{verbatim} +from datetime import datetime + +ts = datetime.strptime('10:13:15 2006-03-07', + '%H:%M:%S %Y-%m-%d') +\end{verbatim} + +\item The \method{SequenceMatcher.get_matching_blocks()} method +in the \module{difflib} module now guarantees to return a minimal list +of blocks describing matching subsequences. Previously, the algorithm would +occasionally break a block of matching elements into two list entries. +(Enhancement by Tim Peters.) + +\item The \module{doctest} module gained a \code{SKIP} option that +keeps an example from being executed at all. This is intended for +code snippets that are usage examples intended for the reader and +aren't actually test cases. + +An \var{encoding} parameter was added to the \function{testfile()} +function and the \class{DocFileSuite} class to specify the file's +encoding. This makes it easier to use non-ASCII characters in +tests contained within a docstring. (Contributed by Bjorn Tillenius.) +% Patch 1080727 + +\item The \module{email} package has been updated to version 4.0. +% XXX need to provide some more detail here +(Contributed by Barry Warsaw.) + +\item The \module{fileinput} module was made more flexible. +Unicode filenames are now supported, and a \var{mode} parameter that +defaults to \code{"r"} was added to the +\function{input()} function to allow opening files in binary or +universal-newline mode. Another new parameter, \var{openhook}, +lets you use a function other than \function{open()} +to open the input files. Once you're iterating over +the set of files, the \class{FileInput} object's new +\method{fileno()} returns the file descriptor for the currently opened file. +(Contributed by Georg Brandl.) + +\item In the \module{gc} module, the new \function{get_count()} function +returns a 3-tuple containing the current collection counts for the +three GC generations. This is accounting information for the garbage +collector; when these counts reach a specified threshold, a garbage +collection sweep will be made. The existing \function{gc.collect()} +function now takes an optional \var{generation} argument of 0, 1, or 2 +to specify which generation to collect. +(Contributed by Barry Warsaw.) + +\item The \function{nsmallest()} and +\function{nlargest()} functions in the \module{heapq} module +now support a \code{key} keyword parameter similar to the one +provided by the \function{min()}/\function{max()} functions +and the \method{sort()} methods. For example: + +\begin{verbatim} +>>> import heapq +>>> L = ["short", 'medium', 'longest', 'longer still'] +>>> heapq.nsmallest(2, L) # Return two lowest elements, lexicographically +['longer still', 'longest'] +>>> heapq.nsmallest(2, L, key=len) # Return two shortest elements +['short', 'medium'] +\end{verbatim} + +(Contributed by Raymond Hettinger.) + +\item The \function{itertools.islice()} function now accepts +\code{None} for the start and step arguments. This makes it more +compatible with the attributes of slice objects, so that you can now write +the following: + +\begin{verbatim} +s = slice(5) # Create slice object +itertools.islice(iterable, s.start, s.stop, s.step) +\end{verbatim} + +(Contributed by Raymond Hettinger.) + +\item The \function{format()} function in the \module{locale} module +has been modified and two new functions were added, +\function{format_string()} and \function{currency()}. + +The \function{format()} function's \var{val} parameter could +previously be a string as long as no more than one \%char specifier +appeared; now the parameter must be exactly one \%char specifier with +no surrounding text. An optional \var{monetary} parameter was also +added which, if \code{True}, will use the locale's rules for +formatting currency in placing a separator between groups of three +digits. + +To format strings with multiple \%char specifiers, use the new +\function{format_string()} function that works like \function{format()} +but also supports mixing \%char specifiers with +arbitrary text. + +A new \function{currency()} function was also added that formats a +number according to the current locale's settings. + +(Contributed by Georg Brandl.) +% Patch 1180296 + +\item The \module{mailbox} module underwent a massive rewrite to add +the capability to modify mailboxes in addition to reading them. A new +set of classes that include \class{mbox}, \class{MH}, and +\class{Maildir} are used to read mailboxes, and have an +\method{add(\var{message})} method to add messages, +\method{remove(\var{key})} to remove messages, and +\method{lock()}/\method{unlock()} to lock/unlock the mailbox. The +following example converts a maildir-format mailbox into an mbox-format one: + +\begin{verbatim} +import mailbox + +# 'factory=None' uses email.Message.Message as the class representing +# individual messages. +src = mailbox.Maildir('maildir', factory=None) +dest = mailbox.mbox('/tmp/mbox') + +for msg in src: + dest.add(msg) +\end{verbatim} + +(Contributed by Gregory K. Johnson. Funding was provided by Google's +2005 Summer of Code.) + +\item New module: the \module{msilib} module allows creating +Microsoft Installer \file{.msi} files and CAB files. Some support +for reading the \file{.msi} database is also included. +(Contributed by Martin von~L\"owis.) + +\item The \module{nis} module now supports accessing domains other +than the system default domain by supplying a \var{domain} argument to +the \function{nis.match()} and \function{nis.maps()} functions. +(Contributed by Ben Bell.) + +\item The \module{operator} module's \function{itemgetter()} +and \function{attrgetter()} functions now support multiple fields. +A call such as \code{operator.attrgetter('a', 'b')} +will return a function +that retrieves the \member{a} and \member{b} attributes. Combining +this new feature with the \method{sort()} method's \code{key} parameter +lets you easily sort lists using multiple fields. +(Contributed by Raymond Hettinger.) + +\item The \module{optparse} module was updated to version 1.5.1 of the +Optik library. The \class{OptionParser} class gained an +\member{epilog} attribute, a string that will be printed after the +help message, and a \method{destroy()} method to break reference +cycles created by the object. (Contributed by Greg Ward.) + +\item The \module{os} module underwent several changes. The +\member{stat_float_times} variable now defaults to true, meaning that +\function{os.stat()} will now return time values as floats. (This +doesn't necessarily mean that \function{os.stat()} will return times +that are precise to fractions of a second; not all systems support +such precision.) + +Constants named \member{os.SEEK_SET}, \member{os.SEEK_CUR}, and +\member{os.SEEK_END} have been added; these are the parameters to the +\function{os.lseek()} function. Two new constants for locking are +\member{os.O_SHLOCK} and \member{os.O_EXLOCK}. + +Two new functions, \function{wait3()} and \function{wait4()}, were +added. They're similar the \function{waitpid()} function which waits +for a child process to exit and returns a tuple of the process ID and +its exit status, but \function{wait3()} and \function{wait4()} return +additional information. \function{wait3()} doesn't take a process ID +as input, so it waits for any child process to exit and returns a +3-tuple of \var{process-id}, \var{exit-status}, \var{resource-usage} +as returned from the \function{resource.getrusage()} function. +\function{wait4(\var{pid})} does take a process ID. +(Contributed by Chad J. Schroeder.) + +On FreeBSD, the \function{os.stat()} function now returns +times with nanosecond resolution, and the returned object +now has \member{st_gen} and \member{st_birthtime}. +The \member{st_flags} member is also available, if the platform supports it. +(Contributed by Antti Louko and Diego Petten\`o.) +% (Patch 1180695, 1212117) + +\item The Python debugger provided by the \module{pdb} module +can now store lists of commands to execute when a breakpoint is +reached and execution stops. Once breakpoint \#1 has been created, +enter \samp{commands 1} and enter a series of commands to be executed, +finishing the list with \samp{end}. The command list can include +commands that resume execution, such as \samp{continue} or +\samp{next}. (Contributed by Gr\'egoire Dooms.) +% Patch 790710 + +\item The \module{pickle} and \module{cPickle} modules no +longer accept a return value of \code{None} from the +\method{__reduce__()} method; the method must return a tuple of +arguments instead. The ability to return \code{None} was deprecated +in Python 2.4, so this completes the removal of the feature. + +\item The \module{pkgutil} module, containing various utility +functions for finding packages, was enhanced to support PEP 302's +import hooks and now also works for packages stored in ZIP-format archives. +(Contributed by Phillip J. Eby.) + +\item The pybench benchmark suite by Marc-Andr\'e~Lemburg is now +included in the \file{Tools/pybench} directory. The pybench suite is +an improvement on the commonly used \file{pystone.py} program because +pybench provides a more detailed measurement of the interpreter's +speed. It times particular operations such as function calls, +tuple slicing, method lookups, and numeric operations, instead of +performing many different operations and reducing the result to a +single number as \file{pystone.py} does. + +\item The \module{pyexpat} module now uses version 2.0 of the Expat parser. +(Contributed by Trent Mick.) + +\item The \class{Queue} class provided by the \module{Queue} module +gained two new methods. \method{join()} blocks until all items in +the queue have been retrieved and all processing work on the items +have been completed. Worker threads call the other new method, +\method{task_done()}, to signal that processing for an item has been +completed. (Contributed by Raymond Hettinger.) + +\item The old \module{regex} and \module{regsub} modules, which have been +deprecated ever since Python 2.0, have finally been deleted. +Other deleted modules: \module{statcache}, \module{tzparse}, +\module{whrandom}. + +\item Also deleted: the \file{lib-old} directory, +which includes ancient modules such as \module{dircmp} and +\module{ni}, was removed. \file{lib-old} wasn't on the default +\code{sys.path}, so unless your programs explicitly added the directory to +\code{sys.path}, this removal shouldn't affect your code. + +\item The \module{rlcompleter} module is no longer +dependent on importing the \module{readline} module and +therefore now works on non-{\UNIX} platforms. +(Patch from Robert Kiendl.) +% Patch #1472854 + +\item The \module{SimpleXMLRPCServer} and \module{DocXMLRPCServer} +classes now have a \member{rpc_paths} attribute that constrains +XML-RPC operations to a limited set of URL paths; the default is +to allow only \code{'/'} and \code{'/RPC2'}. Setting +\member{rpc_paths} to \code{None} or an empty tuple disables +this path checking. +% Bug #1473048 + +\item The \module{socket} module now supports \constant{AF_NETLINK} +sockets on Linux, thanks to a patch from Philippe Biondi. +Netlink sockets are a Linux-specific mechanism for communications +between a user-space process and kernel code; an introductory +article about them is at \url{http://www.linuxjournal.com/article/7356}. +In Python code, netlink addresses are represented as a tuple of 2 integers, +\code{(\var{pid}, \var{group_mask})}. + +Two new methods on socket objects, \method{recv_into(\var{buffer})} and +\method{recvfrom_into(\var{buffer})}, store the received data in an object +that supports the buffer protocol instead of returning the data as a +string. This means you can put the data directly into an array or a +memory-mapped file. + +Socket objects also gained \method{getfamily()}, \method{gettype()}, +and \method{getproto()} accessor methods to retrieve the family, type, +and protocol values for the socket. + +\item New module: the \module{spwd} module provides functions for +accessing the shadow password database on systems that support +shadow passwords. + +\item The \module{struct} is now faster because it +compiles format strings into \class{Struct} objects +with \method{pack()} and \method{unpack()} methods. This is similar +to how the \module{re} module lets you create compiled regular +expression objects. You can still use the module-level +\function{pack()} and \function{unpack()} functions; they'll create +\class{Struct} objects and cache them. Or you can use +\class{Struct} instances directly: + +\begin{verbatim} +s = struct.Struct('ih3s') + +data = s.pack(1972, 187, 'abc') +year, number, name = s.unpack(data) +\end{verbatim} + +You can also pack and unpack data to and from buffer objects directly +using the \method{pack_into(\var{buffer}, \var{offset}, \var{v1}, +\var{v2}, ...)} and \method{unpack_from(\var{buffer}, \var{offset})} +methods. This lets you store data directly into an array or a +memory-mapped file. + +(\class{Struct} objects were implemented by Bob Ippolito at the +NeedForSpeed sprint. Support for buffer objects was added by Martin +Blais, also at the NeedForSpeed sprint.) + +\item The Python developers switched from CVS to Subversion during the 2.5 +development process. Information about the exact build version is +available as the \code{sys.subversion} variable, a 3-tuple of +\code{(\var{interpreter-name}, \var{branch-name}, +\var{revision-range})}. For example, at the time of writing my copy +of 2.5 was reporting \code{('CPython', 'trunk', '45313:45315')}. + +This information is also available to C extensions via the +\cfunction{Py_GetBuildInfo()} function that returns a +string of build information like this: +\code{"trunk:45355:45356M, Apr 13 2006, 07:42:19"}. +(Contributed by Barry Warsaw.) + +\item Another new function, \function{sys._current_frames()}, returns +the current stack frames for all running threads as a dictionary +mapping thread identifiers to the topmost stack frame currently active +in that thread at the time the function is called. (Contributed by +Tim Peters.) + +\item The \class{TarFile} class in the \module{tarfile} module now has +an \method{extractall()} method that extracts all members from the +archive into the current working directory. It's also possible to set +a different directory as the extraction target, and to unpack only a +subset of the archive's members. + +The compression used for a tarfile opened in stream mode can now be +autodetected using the mode \code{'r|*'}. +% patch 918101 +(Contributed by Lars Gust\"abel.) + +\item The \module{threading} module now lets you set the stack size +used when new threads are created. The +\function{stack_size(\optional{\var{size}})} function returns the +currently configured stack size, and supplying the optional \var{size} +parameter sets a new value. Not all platforms support changing the +stack size, but Windows, POSIX threading, and OS/2 all do. +(Contributed by Andrew MacIntyre.) +% Patch 1454481 + +\item The \module{unicodedata} module has been updated to use version 4.1.0 +of the Unicode character database. Version 3.2.0 is required +by some specifications, so it's still available as +\member{unicodedata.ucd_3_2_0}. + +\item New module: the \module{uuid} module generates +universally unique identifiers (UUIDs) according to \rfc{4122}. The +RFC defines several different UUID versions that are generated from a +starting string, from system properties, or purely randomly. This +module contains a \class{UUID} class and +functions named \function{uuid1()}, +\function{uuid3()}, \function{uuid4()}, and +\function{uuid5()} to generate different versions of UUID. (Version 2 UUIDs +are not specified in \rfc{4122} and are not supported by this module.) + +\begin{verbatim} +>>> import uuid +>>> # make a UUID based on the host ID and current time +>>> uuid.uuid1() +UUID('a8098c1a-f86e-11da-bd1a-00112444be1e') + +>>> # make a UUID using an MD5 hash of a namespace UUID and a name +>>> uuid.uuid3(uuid.NAMESPACE_DNS, 'python.org') +UUID('6fa459ea-ee8a-3ca4-894e-db77e160355e') + +>>> # make a random UUID +>>> uuid.uuid4() +UUID('16fd2706-8baf-433b-82eb-8c7fada847da') + +>>> # make a UUID using a SHA-1 hash of a namespace UUID and a name +>>> uuid.uuid5(uuid.NAMESPACE_DNS, 'python.org') +UUID('886313e1-3b8a-5372-9b90-0c9aee199e5d') +\end{verbatim} + +(Contributed by Ka-Ping Yee.) + +\item The \module{weakref} module's \class{WeakKeyDictionary} and +\class{WeakValueDictionary} types gained new methods for iterating +over the weak references contained in the dictionary. +\method{iterkeyrefs()} and \method{keyrefs()} methods were +added to \class{WeakKeyDictionary}, and +\method{itervaluerefs()} and \method{valuerefs()} were added to +\class{WeakValueDictionary}. (Contributed by Fred L.~Drake, Jr.) + +\item The \module{webbrowser} module received a number of +enhancements. +It's now usable as a script with \code{python -m webbrowser}, taking a +URL as the argument; there are a number of switches +to control the behaviour (\programopt{-n} for a new browser window, +\programopt{-t} for a new tab). New module-level functions, +\function{open_new()} and \function{open_new_tab()}, were added +to support this. The module's \function{open()} function supports an +additional feature, an \var{autoraise} parameter that signals whether +to raise the open window when possible. A number of additional +browsers were added to the supported list such as Firefox, Opera, +Konqueror, and elinks. (Contributed by Oleg Broytmann and Georg +Brandl.) +% Patch #754022 + +\item The \module{xmlrpclib} module now supports returning + \class{datetime} objects for the XML-RPC date type. Supply + \code{use_datetime=True} to the \function{loads()} function + or the \class{Unmarshaller} class to enable this feature. + (Contributed by Skip Montanaro.) +% Patch 1120353 + +\item The \module{zipfile} module now supports the ZIP64 version of the +format, meaning that a .zip archive can now be larger than 4~GiB and +can contain individual files larger than 4~GiB. (Contributed by +Ronald Oussoren.) +% Patch 1446489 + +\item The \module{zlib} module's \class{Compress} and \class{Decompress} +objects now support a \method{copy()} method that makes a copy of the +object's internal state and returns a new +\class{Compress} or \class{Decompress} object. +(Contributed by Chris AtLee.) +% Patch 1435422 + +\end{itemize} + + + +%====================================================================== +\subsection{The ctypes package\label{module-ctypes}} + +The \module{ctypes} package, written by Thomas Heller, has been added +to the standard library. \module{ctypes} lets you call arbitrary functions +in shared libraries or DLLs. Long-time users may remember the \module{dl} module, which +provides functions for loading shared libraries and calling functions in them. The \module{ctypes} package is much fancier. + +To load a shared library or DLL, you must create an instance of the +\class{CDLL} class and provide the name or path of the shared library +or DLL. Once that's done, you can call arbitrary functions +by accessing them as attributes of the \class{CDLL} object. + +\begin{verbatim} +import ctypes + +libc = ctypes.CDLL('libc.so.6') +result = libc.printf("Line of output\n") +\end{verbatim} + +Type constructors for the various C types are provided: \function{c_int}, +\function{c_float}, \function{c_double}, \function{c_char_p} (equivalent to \ctype{char *}), and so forth. Unlike Python's types, the C versions are all mutable; you can assign to their \member{value} attribute +to change the wrapped value. Python integers and strings will be automatically +converted to the corresponding C types, but for other types you +must call the correct type constructor. (And I mean \emph{must}; +getting it wrong will often result in the interpreter crashing +with a segmentation fault.) + +You shouldn't use \function{c_char_p} with a Python string when the C function will be modifying the memory area, because Python strings are +supposed to be immutable; breaking this rule will cause puzzling bugs. When you need a modifiable memory area, +use \function{create_string_buffer()}: + +\begin{verbatim} +s = "this is a string" +buf = ctypes.create_string_buffer(s) +libc.strfry(buf) +\end{verbatim} + +C functions are assumed to return integers, but you can set +the \member{restype} attribute of the function object to +change this: + +\begin{verbatim} +>>> libc.atof('2.71828') +-1783957616 +>>> libc.atof.restype = ctypes.c_double +>>> libc.atof('2.71828') +2.71828 +\end{verbatim} + +\module{ctypes} also provides a wrapper for Python's C API +as the \code{ctypes.pythonapi} object. This object does \emph{not} +release the global interpreter lock before calling a function, because the lock must be held when calling into the interpreter's code. +There's a \class{py_object()} type constructor that will create a +\ctype{PyObject *} pointer. A simple usage: + +\begin{verbatim} +import ctypes + +d = {} +ctypes.pythonapi.PyObject_SetItem(ctypes.py_object(d), + ctypes.py_object("abc"), ctypes.py_object(1)) +# d is now {'abc', 1}. +\end{verbatim} + +Don't forget to use \class{py_object()}; if it's omitted you end +up with a segmentation fault. + +\module{ctypes} has been around for a while, but people still write +and distribution hand-coded extension modules because you can't rely on \module{ctypes} being present. +Perhaps developers will begin to write +Python wrappers atop a library accessed through \module{ctypes} instead +of extension modules, now that \module{ctypes} is included with core Python. + +\begin{seealso} + +\seeurl{http://starship.python.net/crew/theller/ctypes/} +{The ctypes web page, with a tutorial, reference, and FAQ.} + +\seeurl{../lib/module-ctypes.html}{The documentation +for the \module{ctypes} module.} + +\end{seealso} + + +%====================================================================== +\subsection{The ElementTree package\label{module-etree}} + +A subset of Fredrik Lundh's ElementTree library for processing XML has +been added to the standard library as \module{xml.etree}. The +available modules are +\module{ElementTree}, \module{ElementPath}, and +\module{ElementInclude} from ElementTree 1.2.6. +The \module{cElementTree} accelerator module is also included. + +The rest of this section will provide a brief overview of using +ElementTree. Full documentation for ElementTree is available at +\url{http://effbot.org/zone/element-index.htm}. + +ElementTree represents an XML document as a tree of element nodes. +The text content of the document is stored as the \member{.text} +and \member{.tail} attributes of +(This is one of the major differences between ElementTree and +the Document Object Model; in the DOM there are many different +types of node, including \class{TextNode}.) + +The most commonly used parsing function is \function{parse()}, that +takes either a string (assumed to contain a filename) or a file-like +object and returns an \class{ElementTree} instance: + +\begin{verbatim} +from xml.etree import ElementTree as ET + +tree = ET.parse('ex-1.xml') + +feed = urllib.urlopen( + 'http://planet.python.org/rss10.xml') +tree = ET.parse(feed) +\end{verbatim} + +Once you have an \class{ElementTree} instance, you +can call its \method{getroot()} method to get the root \class{Element} node. + +There's also an \function{XML()} function that takes a string literal +and returns an \class{Element} node (not an \class{ElementTree}). +This function provides a tidy way to incorporate XML fragments, +approaching the convenience of an XML literal: + +\begin{verbatim} +svg = ET.XML("""<svg width="10px" version="1.0"> + </svg>""") +svg.set('height', '320px') +svg.append(elem1) +\end{verbatim} + +Each XML element supports some dictionary-like and some list-like +access methods. Dictionary-like operations are used to access attribute +values, and list-like operations are used to access child nodes. + +\begin{tableii}{c|l}{code}{Operation}{Result} + \lineii{elem[n]}{Returns n'th child element.} + \lineii{elem[m:n]}{Returns list of m'th through n'th child elements.} + \lineii{len(elem)}{Returns number of child elements.} + \lineii{list(elem)}{Returns list of child elements.} + \lineii{elem.append(elem2)}{Adds \var{elem2} as a child.} + \lineii{elem.insert(index, elem2)}{Inserts \var{elem2} at the specified location.} + \lineii{del elem[n]}{Deletes n'th child element.} + \lineii{elem.keys()}{Returns list of attribute names.} + \lineii{elem.get(name)}{Returns value of attribute \var{name}.} + \lineii{elem.set(name, value)}{Sets new value for attribute \var{name}.} + \lineii{elem.attrib}{Retrieves the dictionary containing attributes.} + \lineii{del elem.attrib[name]}{Deletes attribute \var{name}.} +\end{tableii} + +Comments and processing instructions are also represented as +\class{Element} nodes. To check if a node is a comment or processing +instructions: + +\begin{verbatim} +if elem.tag is ET.Comment: + ... +elif elem.tag is ET.ProcessingInstruction: + ... +\end{verbatim} + +To generate XML output, you should call the +\method{ElementTree.write()} method. Like \function{parse()}, +it can take either a string or a file-like object: + +\begin{verbatim} +# Encoding is US-ASCII +tree.write('output.xml') + +# Encoding is UTF-8 +f = open('output.xml', 'w') +tree.write(f, encoding='utf-8') +\end{verbatim} + +(Caution: the default encoding used for output is ASCII. For general +XML work, where an element's name may contain arbitrary Unicode +characters, ASCII isn't a very useful encoding because it will raise +an exception if an element's name contains any characters with values +greater than 127. Therefore, it's best to specify a different +encoding such as UTF-8 that can handle any Unicode character.) + +This section is only a partial description of the ElementTree interfaces. +Please read the package's official documentation for more details. + +\begin{seealso} + +\seeurl{http://effbot.org/zone/element-index.htm} +{Official documentation for ElementTree.} + +\end{seealso} + + +%====================================================================== +\subsection{The hashlib package\label{module-hashlib}} + +A new \module{hashlib} module, written by Gregory P. Smith, +has been added to replace the +\module{md5} and \module{sha} modules. \module{hashlib} adds support +for additional secure hashes (SHA-224, SHA-256, SHA-384, and SHA-512). +When available, the module uses OpenSSL for fast platform optimized +implementations of algorithms. + +The old \module{md5} and \module{sha} modules still exist as wrappers +around hashlib to preserve backwards compatibility. The new module's +interface is very close to that of the old modules, but not identical. +The most significant difference is that the constructor functions +for creating new hashing objects are named differently. + +\begin{verbatim} +# Old versions +h = md5.md5() +h = md5.new() + +# New version +h = hashlib.md5() + +# Old versions +h = sha.sha() +h = sha.new() + +# New version +h = hashlib.sha1() + +# Hash that weren't previously available +h = hashlib.sha224() +h = hashlib.sha256() +h = hashlib.sha384() +h = hashlib.sha512() + +# Alternative form +h = hashlib.new('md5') # Provide algorithm as a string +\end{verbatim} + +Once a hash object has been created, its methods are the same as before: +\method{update(\var{string})} hashes the specified string into the +current digest state, \method{digest()} and \method{hexdigest()} +return the digest value as a binary string or a string of hex digits, +and \method{copy()} returns a new hashing object with the same digest state. + +\begin{seealso} + +\seeurl{../lib/module-hashlib.html}{The documentation +for the \module{hashlib} module.} + +\end{seealso} + + +%====================================================================== +\subsection{The sqlite3 package\label{module-sqlite}} + +The pysqlite module (\url{http://www.pysqlite.org}), a wrapper for the +SQLite embedded database, has been added to the standard library under +the package name \module{sqlite3}. + +SQLite is a C library that provides a lightweight disk-based database +that doesn't require a separate server process and allows accessing +the database using a nonstandard variant of the SQL query language. +Some applications can use SQLite for internal data storage. It's also +possible to prototype an application using SQLite and then port the +code to a larger database such as PostgreSQL or Oracle. + +pysqlite was written by Gerhard H\"aring and provides a SQL interface +compliant with the DB-API 2.0 specification described by +\pep{249}. + +If you're compiling the Python source yourself, note that the source +tree doesn't include the SQLite code, only the wrapper module. +You'll need to have the SQLite libraries and headers installed before +compiling Python, and the build process will compile the module when +the necessary headers are available. + +To use the module, you must first create a \class{Connection} object +that represents the database. Here the data will be stored in the +\file{/tmp/example} file: + +\begin{verbatim} +conn = sqlite3.connect('/tmp/example') +\end{verbatim} + +You can also supply the special name \samp{:memory:} to create +a database in RAM. + +Once you have a \class{Connection}, you can create a \class{Cursor} +object and call its \method{execute()} method to perform SQL commands: + +\begin{verbatim} +c = conn.cursor() + +# Create table +c.execute('''create table stocks +(date text, trans text, symbol text, + qty real, price real)''') + +# Insert a row of data +c.execute("""insert into stocks + values ('2006-01-05','BUY','RHAT',100,35.14)""") +\end{verbatim} + +Usually your SQL operations will need to use values from Python +variables. You shouldn't assemble your query using Python's string +operations because doing so is insecure; it makes your program +vulnerable to an SQL injection attack. + +Instead, use the DB-API's parameter substitution. Put \samp{?} as a +placeholder wherever you want to use a value, and then provide a tuple +of values as the second argument to the cursor's \method{execute()} +method. (Other database modules may use a different placeholder, +such as \samp{\%s} or \samp{:1}.) For example: + +\begin{verbatim} +# Never do this -- insecure! +symbol = 'IBM' +c.execute("... where symbol = '%s'" % symbol) + +# Do this instead +t = (symbol,) +c.execute('select * from stocks where symbol=?', t) + +# Larger example +for t in (('2006-03-28', 'BUY', 'IBM', 1000, 45.00), + ('2006-04-05', 'BUY', 'MSOFT', 1000, 72.00), + ('2006-04-06', 'SELL', 'IBM', 500, 53.00), + ): + c.execute('insert into stocks values (?,?,?,?,?)', t) +\end{verbatim} + +To retrieve data after executing a SELECT statement, you can either +treat the cursor as an iterator, call the cursor's \method{fetchone()} +method to retrieve a single matching row, +or call \method{fetchall()} to get a list of the matching rows. + +This example uses the iterator form: + +\begin{verbatim} +>>> c = conn.cursor() +>>> c.execute('select * from stocks order by price') +>>> for row in c: +... print row +... +(u'2006-01-05', u'BUY', u'RHAT', 100, 35.140000000000001) +(u'2006-03-28', u'BUY', u'IBM', 1000, 45.0) +(u'2006-04-06', u'SELL', u'IBM', 500, 53.0) +(u'2006-04-05', u'BUY', u'MSOFT', 1000, 72.0) +>>> +\end{verbatim} + +For more information about the SQL dialect supported by SQLite, see +\url{http://www.sqlite.org}. + +\begin{seealso} + +\seeurl{http://www.pysqlite.org} +{The pysqlite web page.} + +\seeurl{http://www.sqlite.org} +{The SQLite web page; the documentation describes the syntax and the +available data types for the supported SQL dialect.} + +\seeurl{../lib/module-sqlite3.html}{The documentation +for the \module{sqlite3} module.} + +\seepep{249}{Database API Specification 2.0}{PEP written by +Marc-Andr\'e Lemburg.} + +\end{seealso} + + +%====================================================================== +\subsection{The wsgiref package\label{module-wsgiref}} + +% XXX should this be in a PEP 333 section instead? + +The Web Server Gateway Interface (WSGI) v1.0 defines a standard +interface between web servers and Python web applications and is +described in \pep{333}. The \module{wsgiref} package is a reference +implementation of the WSGI specification. + +The package includes a basic HTTP server that will run a WSGI +application; this server is useful for debugging but isn't intended for +production use. Setting up a server takes only a few lines of code: + +\begin{verbatim} +from wsgiref import simple_server + +wsgi_app = ... + +host = '' +port = 8000 +httpd = simple_server.make_server(host, port, wsgi_app) +httpd.serve_forever() +\end{verbatim} + +% XXX discuss structure of WSGI applications? +% XXX provide an example using Django or some other framework? + +\begin{seealso} + +\seeurl{http://www.wsgi.org}{A central web site for WSGI-related resources.} + +\seepep{333}{Python Web Server Gateway Interface v1.0}{PEP written by +Phillip J. Eby.} + +\end{seealso} + + +% ====================================================================== +\section{Build and C API Changes\label{build-api}} + +Changes to Python's build process and to the C API include: + +\begin{itemize} + +\item The Python source tree was converted from CVS to Subversion, +in a complex migration procedure that was supervised and flawlessly +carried out by Martin von~L\"owis. The procedure was developed as +\pep{347}. + +\item Coverity, a company that markets a source code analysis tool +called Prevent, provided the results of their examination of the Python +source code. The analysis found about 60 bugs that +were quickly fixed. Many of the bugs were refcounting problems, often +occurring in error-handling code. See +\url{http://scan.coverity.com} for the statistics. + +\item The largest change to the C API came from \pep{353}, +which modifies the interpreter to use a \ctype{Py_ssize_t} type +definition instead of \ctype{int}. See the earlier +section~\ref{pep-353} for a discussion of this change. + +\item The design of the bytecode compiler has changed a great deal, +no longer generating bytecode by traversing the parse tree. Instead +the parse tree is converted to an abstract syntax tree (or AST), and it is +the abstract syntax tree that's traversed to produce the bytecode. + +It's possible for Python code to obtain AST objects by using the +\function{compile()} built-in and specifying \code{_ast.PyCF_ONLY_AST} +as the value of the +\var{flags} parameter: + +\begin{verbatim} +from _ast import PyCF_ONLY_AST +ast = compile("""a=0 +for i in range(10): + a += i +""", "<string>", 'exec', PyCF_ONLY_AST) + +assignment = ast.body[0] +for_loop = ast.body[1] +\end{verbatim} + +No official documentation has been written for the AST code yet, but +\pep{339} discusses the design. To start learning about the code, read the +definition of the various AST nodes in \file{Parser/Python.asdl}. A +Python script reads this file and generates a set of C structure +definitions in \file{Include/Python-ast.h}. The +\cfunction{PyParser_ASTFromString()} and +\cfunction{PyParser_ASTFromFile()}, defined in +\file{Include/pythonrun.h}, take Python source as input and return the +root of an AST representing the contents. This AST can then be turned +into a code object by \cfunction{PyAST_Compile()}. For more +information, read the source code, and then ask questions on +python-dev. + +% List of names taken from Jeremy's python-dev post at +% http://mail.python.org/pipermail/python-dev/2005-October/057500.html +The AST code was developed under Jeremy Hylton's management, and +implemented by (in alphabetical order) Brett Cannon, Nick Coghlan, +Grant Edwards, John Ehresman, Kurt Kaiser, Neal Norwitz, Tim Peters, +Armin Rigo, and Neil Schemenauer, plus the participants in a number of +AST sprints at conferences such as PyCon. + +\item Evan Jones's patch to obmalloc, first described in a talk +at PyCon DC 2005, was applied. Python 2.4 allocated small objects in +256K-sized arenas, but never freed arenas. With this patch, Python +will free arenas when they're empty. The net effect is that on some +platforms, when you allocate many objects, Python's memory usage may +actually drop when you delete them and the memory may be returned to +the operating system. (Implemented by Evan Jones, and reworked by Tim +Peters.) + +Note that this change means extension modules must be more careful +when allocating memory. Python's API has many different +functions for allocating memory that are grouped into families. For +example, \cfunction{PyMem_Malloc()}, \cfunction{PyMem_Realloc()}, and +\cfunction{PyMem_Free()} are one family that allocates raw memory, +while \cfunction{PyObject_Malloc()}, \cfunction{PyObject_Realloc()}, +and \cfunction{PyObject_Free()} are another family that's supposed to +be used for creating Python objects. + +Previously these different families all reduced to the platform's +\cfunction{malloc()} and \cfunction{free()} functions. This meant +it didn't matter if you got things wrong and allocated memory with the +\cfunction{PyMem} function but freed it with the \cfunction{PyObject} +function. With 2.5's changes to obmalloc, these families now do different +things and mismatches will probably result in a segfault. You should +carefully test your C extension modules with Python 2.5. + +\item The built-in set types now have an official C API. Call +\cfunction{PySet_New()} and \cfunction{PyFrozenSet_New()} to create a +new set, \cfunction{PySet_Add()} and \cfunction{PySet_Discard()} to +add and remove elements, and \cfunction{PySet_Contains} and +\cfunction{PySet_Size} to examine the set's state. +(Contributed by Raymond Hettinger.) + +\item C code can now obtain information about the exact revision +of the Python interpreter by calling the +\cfunction{Py_GetBuildInfo()} function that returns a +string of build information like this: +\code{"trunk:45355:45356M, Apr 13 2006, 07:42:19"}. +(Contributed by Barry Warsaw.) + +\item Two new macros can be used to indicate C functions that are +local to the current file so that a faster calling convention can be +used. \cfunction{Py_LOCAL(\var{type})} declares the function as +returning a value of the specified \var{type} and uses a fast-calling +qualifier. \cfunction{Py_LOCAL_INLINE(\var{type})} does the same thing +and also requests the function be inlined. If +\cfunction{PY_LOCAL_AGGRESSIVE} is defined before \file{python.h} is +included, a set of more aggressive optimizations are enabled for the +module; you should benchmark the results to find out if these +optimizations actually make the code faster. (Contributed by Fredrik +Lundh at the NeedForSpeed sprint.) + +\item \cfunction{PyErr_NewException(\var{name}, \var{base}, +\var{dict})} can now accept a tuple of base classes as its \var{base} +argument. (Contributed by Georg Brandl.) + +\item The \cfunction{PyErr_Warn()} function for issuing warnings +is now deprecated in favour of \cfunction{PyErr_WarnEx(category, +message, stacklevel)} which lets you specify the number of stack +frames separating this function and the caller. A \var{stacklevel} of +1 is the function calling \cfunction{PyErr_WarnEx()}, 2 is the +function above that, and so forth. (Added by Neal Norwitz.) + +\item The CPython interpreter is still written in C, but +the code can now be compiled with a {\Cpp} compiler without errors. +(Implemented by Anthony Baxter, Martin von~L\"owis, Skip Montanaro.) + +\item The \cfunction{PyRange_New()} function was removed. It was +never documented, never used in the core code, and had dangerously lax +error checking. In the unlikely case that your extensions were using +it, you can replace it by something like the following: +\begin{verbatim} +range = PyObject_CallFunction((PyObject*) &PyRange_Type, "lll", + start, stop, step); +\end{verbatim} + +\end{itemize} + + +%====================================================================== +\subsection{Port-Specific Changes\label{ports}} + +\begin{itemize} + +\item MacOS X (10.3 and higher): dynamic loading of modules +now uses the \cfunction{dlopen()} function instead of MacOS-specific +functions. + +\item MacOS X: a \longprogramopt{enable-universalsdk} switch was added +to the \program{configure} script that compiles the interpreter as a +universal binary able to run on both PowerPC and Intel processors. +(Contributed by Ronald Oussoren.) + +\item Windows: \file{.dll} is no longer supported as a filename extension for +extension modules. \file{.pyd} is now the only filename extension that will +be searched for. + +\end{itemize} + + +%====================================================================== +\section{Porting to Python 2.5\label{porting}} + +This section lists previously described changes that may require +changes to your code: + +\begin{itemize} + +\item ASCII is now the default encoding for modules. It's now +a syntax error if a module contains string literals with 8-bit +characters but doesn't have an encoding declaration. In Python 2.4 +this triggered a warning, not a syntax error. + +\item Previously, the \member{gi_frame} attribute of a generator +was always a frame object. Because of the \pep{342} changes +described in section~\ref{pep-342}, it's now possible +for \member{gi_frame} to be \code{None}. + +\item A new warning, \class{UnicodeWarning}, is triggered when +you attempt to compare a Unicode string and an 8-bit string that can't +be converted to Unicode using the default ASCII encoding. Previously +such comparisons would raise a \class{UnicodeDecodeError} exception. + +\item Library: the \module{csv} module is now stricter about multi-line quoted +fields. If your files contain newlines embedded within fields, the +input should be split into lines in a manner which preserves the +newline characters. + +\item Library: the \module{locale} module's +\function{format()} function's would previously +accept any string as long as no more than one \%char specifier +appeared. In Python 2.5, the argument must be exactly one \%char +specifier with no surrounding text. + +\item Library: The \module{pickle} and \module{cPickle} modules no +longer accept a return value of \code{None} from the +\method{__reduce__()} method; the method must return a tuple of +arguments instead. The modules also no longer accept the deprecated +\var{bin} keyword parameter. + +\item Library: The \module{SimpleXMLRPCServer} and \module{DocXMLRPCServer} +classes now have a \member{rpc_paths} attribute that constrains +XML-RPC operations to a limited set of URL paths; the default is +to allow only \code{'/'} and \code{'/RPC2'}. Setting +\member{rpc_paths} to \code{None} or an empty tuple disables +this path checking. + +\item C API: Many functions now use \ctype{Py_ssize_t} +instead of \ctype{int} to allow processing more data on 64-bit +machines. Extension code may need to make the same change to avoid +warnings and to support 64-bit machines. See the earlier +section~\ref{pep-353} for a discussion of this change. + +\item C API: +The obmalloc changes mean that +you must be careful to not mix usage +of the \cfunction{PyMem_*()} and \cfunction{PyObject_*()} +families of functions. Memory allocated with +one family's \cfunction{*_Malloc()} must be +freed with the corresponding family's \cfunction{*_Free()} function. + +\end{itemize} + + +%====================================================================== +\section{Acknowledgements \label{acks}} + +The author would like to thank the following people for offering +suggestions, corrections and assistance with various drafts of this +article: Georg Brandl, Nick Coghlan, Phillip J. Eby, Lars Gust\"abel, +Raymond Hettinger, Ralf W. Grosse-Kunstleve, Kent Johnson, Iain Lowe, +Martin von~L\"owis, Fredrik Lundh, Andrew McNamara, Skip Montanaro, +Gustavo Niemeyer, Paul Prescod, James Pryor, Mike Rovner, Scott +Weikart, Barry Warsaw, Thomas Wouters. + +\end{document} |