diff options
author | cinap_lenrek <cinap_lenrek@localhost> | 2011-05-03 11:25:13 +0000 |
---|---|---|
committer | cinap_lenrek <cinap_lenrek@localhost> | 2011-05-03 11:25:13 +0000 |
commit | 458120dd40db6b4df55a4e96b650e16798ef06a0 (patch) | |
tree | 8f82685be24fef97e715c6f5ca4c68d34d5074ee /sys/src/cmd/python/Doc/ref | |
parent | 3a742c699f6806c1145aea5149bf15de15a0afd7 (diff) |
add hg and python
Diffstat (limited to 'sys/src/cmd/python/Doc/ref')
-rw-r--r-- | sys/src/cmd/python/Doc/ref/ref.tex | 68 | ||||
-rw-r--r-- | sys/src/cmd/python/Doc/ref/ref1.tex | 136 | ||||
-rw-r--r-- | sys/src/cmd/python/Doc/ref/ref2.tex | 731 | ||||
-rw-r--r-- | sys/src/cmd/python/Doc/ref/ref3.tex | 2225 | ||||
-rw-r--r-- | sys/src/cmd/python/Doc/ref/ref4.tex | 219 | ||||
-rw-r--r-- | sys/src/cmd/python/Doc/ref/ref5.tex | 1325 | ||||
-rw-r--r-- | sys/src/cmd/python/Doc/ref/ref6.tex | 928 | ||||
-rw-r--r-- | sys/src/cmd/python/Doc/ref/ref7.tex | 544 | ||||
-rw-r--r-- | sys/src/cmd/python/Doc/ref/ref8.tex | 112 | ||||
-rw-r--r-- | sys/src/cmd/python/Doc/ref/reswords.py | 23 |
10 files changed, 6311 insertions, 0 deletions
diff --git a/sys/src/cmd/python/Doc/ref/ref.tex b/sys/src/cmd/python/Doc/ref/ref.tex new file mode 100644 index 000000000..03c0acf8d --- /dev/null +++ b/sys/src/cmd/python/Doc/ref/ref.tex @@ -0,0 +1,68 @@ +\documentclass{manual} + +\title{Python Reference Manual} + +\input{boilerplate} + +\makeindex + +\begin{document} + +\maketitle + +\ifhtml +\chapter*{Front Matter\label{front}} +\fi + +\input{copyright} + +\begin{abstract} + +\noindent +Python is an interpreted, object-oriented, high-level programming +language with dynamic semantics. Its high-level built in data +structures, combined with dynamic typing and dynamic binding, make it +very attractive for rapid application development, as well as for use +as a scripting or glue language to connect existing components +together. Python's simple, easy to learn syntax emphasizes +readability and therefore reduces the cost of program +maintenance. Python supports modules and packages, which encourages +program modularity and code reuse. The Python interpreter and the +extensive standard library are available in source or binary form +without charge for all major platforms, and can be freely distributed. + +This reference manual describes the syntax and ``core semantics'' of +the language. It is terse, but attempts to be exact and complete. +The semantics of non-essential built-in object types and of the +built-in functions and modules are described in the +\citetitle[../lib/lib.html]{Python Library Reference}. For an +informal introduction to the language, see the +\citetitle[../tut/tut.html]{Python Tutorial}. For C or +\Cpp{} programmers, two additional manuals exist: +\citetitle[../ext/ext.html]{Extending and Embedding the Python +Interpreter} describes the high-level picture of how to write a Python +extension module, and the \citetitle[../api/api.html]{Python/C API +Reference Manual} describes the interfaces available to +C/\Cpp{} programmers in detail. + +\end{abstract} + +\tableofcontents + +\input{ref1} % Introduction +\input{ref2} % Lexical analysis +\input{ref3} % Data model +\input{ref4} % Execution model +\input{ref5} % Expressions and conditions +\input{ref6} % Simple statements +\input{ref7} % Compound statements +\input{ref8} % Top-level components + +\appendix + +\chapter{History and License} +\input{license} + +\input{ref.ind} + +\end{document} diff --git a/sys/src/cmd/python/Doc/ref/ref1.tex b/sys/src/cmd/python/Doc/ref/ref1.tex new file mode 100644 index 000000000..623471656 --- /dev/null +++ b/sys/src/cmd/python/Doc/ref/ref1.tex @@ -0,0 +1,136 @@ +\chapter{Introduction\label{introduction}} + +This reference manual describes the Python programming language. +It is not intended as a tutorial. + +While I am trying to be as precise as possible, I chose to use English +rather than formal specifications for everything except syntax and +lexical analysis. This should make the document more understandable +to the average reader, but will leave room for ambiguities. +Consequently, if you were coming from Mars and tried to re-implement +Python from this document alone, you might have to guess things and in +fact you would probably end up implementing quite a different language. +On the other hand, if you are using +Python and wonder what the precise rules about a particular area of +the language are, you should definitely be able to find them here. +If you would like to see a more formal definition of the language, +maybe you could volunteer your time --- or invent a cloning machine +:-). + +It is dangerous to add too many implementation details to a language +reference document --- the implementation may change, and other +implementations of the same language may work differently. On the +other hand, there is currently only one Python implementation in +widespread use (although alternate implementations exist), and +its particular quirks are sometimes worth being mentioned, especially +where the implementation imposes additional limitations. Therefore, +you'll find short ``implementation notes'' sprinkled throughout the +text. + +Every Python implementation comes with a number of built-in and +standard modules. These are not documented here, but in the separate +\citetitle[../lib/lib.html]{Python Library Reference} document. A few +built-in modules are mentioned when they interact in a significant way +with the language definition. + + +\section{Alternate Implementations\label{implementations}} + +Though there is one Python implementation which is by far the most +popular, there are some alternate implementations which are of +particular interest to different audiences. + +Known implementations include: + +\begin{itemize} +\item[CPython] +This is the original and most-maintained implementation of Python, +written in C. New language features generally appear here first. + +\item[Jython] +Python implemented in Java. This implementation can be used as a +scripting language for Java applications, or can be used to create +applications using the Java class libraries. It is also often used to +create tests for Java libraries. More information can be found at +\ulink{the Jython website}{http://www.jython.org/}. + +\item[Python for .NET] +This implementation actually uses the CPython implementation, but is a +managed .NET application and makes .NET libraries available. This was +created by Brian Lloyd. For more information, see the \ulink{Python +for .NET home page}{http://www.zope.org/Members/Brian/PythonNet}. + +\item[IronPython] +An alternate Python for\ .NET. Unlike Python.NET, this is a complete +Python implementation that generates IL, and compiles Python code +directly to\ .NET assemblies. It was created by Jim Hugunin, the +original creator of Jython. For more information, see \ulink{the +IronPython website}{http://workspaces.gotdotnet.com/ironpython}. + +\item[PyPy] +An implementation of Python written in Python; even the bytecode +interpreter is written in Python. This is executed using CPython as +the underlying interpreter. One of the goals of the project is to +encourage experimentation with the language itself by making it easier +to modify the interpreter (since it is written in Python). Additional +information is available on \ulink{the PyPy project's home +page}{http://codespeak.net/pypy/}. +\end{itemize} + +Each of these implementations varies in some way from the language as +documented in this manual, or introduces specific information beyond +what's covered in the standard Python documentation. Please refer to +the implementation-specific documentation to determine what else you +need to know about the specific implementation you're using. + + +\section{Notation\label{notation}} + +The descriptions of lexical analysis and syntax use a modified BNF +grammar notation. This uses the following style of definition: +\index{BNF} +\index{grammar} +\index{syntax} +\index{notation} + +\begin{productionlist}[*] + \production{name}{\token{lc_letter} (\token{lc_letter} | "_")*} + \production{lc_letter}{"a"..."z"} +\end{productionlist} + +The first line says that a \code{name} is an \code{lc_letter} followed by +a sequence of zero or more \code{lc_letter}s and underscores. An +\code{lc_letter} in turn is any of the single characters \character{a} +through \character{z}. (This rule is actually adhered to for the +names defined in lexical and grammar rules in this document.) + +Each rule begins with a name (which is the name defined by the rule) +and \code{::=}. A vertical bar (\code{|}) is used to separate +alternatives; it is the least binding operator in this notation. A +star (\code{*}) means zero or more repetitions of the preceding item; +likewise, a plus (\code{+}) means one or more repetitions, and a +phrase enclosed in square brackets (\code{[ ]}) means zero or one +occurrences (in other words, the enclosed phrase is optional). The +\code{*} and \code{+} operators bind as tightly as possible; +parentheses are used for grouping. Literal strings are enclosed in +quotes. White space is only meaningful to separate tokens. +Rules are normally contained on a single line; rules with many +alternatives may be formatted alternatively with each line after the +first beginning with a vertical bar. + +In lexical definitions (as the example above), two more conventions +are used: Two literal characters separated by three dots mean a choice +of any single character in the given (inclusive) range of \ASCII{} +characters. A phrase between angular brackets (\code{<...>}) gives an +informal description of the symbol defined; e.g., this could be used +to describe the notion of `control character' if needed. +\index{lexical definitions} +\index{ASCII@\ASCII} + +Even though the notation used is almost the same, there is a big +difference between the meaning of lexical and syntactic definitions: +a lexical definition operates on the individual characters of the +input source, while a syntax definition operates on the stream of +tokens generated by the lexical analysis. All uses of BNF in the next +chapter (``Lexical Analysis'') are lexical definitions; uses in +subsequent chapters are syntactic definitions. diff --git a/sys/src/cmd/python/Doc/ref/ref2.tex b/sys/src/cmd/python/Doc/ref/ref2.tex new file mode 100644 index 000000000..bad4609fb --- /dev/null +++ b/sys/src/cmd/python/Doc/ref/ref2.tex @@ -0,0 +1,731 @@ +\chapter{Lexical analysis\label{lexical}} + +A Python program is read by a \emph{parser}. Input to the parser is a +stream of \emph{tokens}, generated by the \emph{lexical analyzer}. This +chapter describes how the lexical analyzer breaks a file into tokens. +\index{lexical analysis} +\index{parser} +\index{token} + +Python uses the 7-bit \ASCII{} character set for program text. +\versionadded[An encoding declaration can be used to indicate that +string literals and comments use an encoding different from ASCII]{2.3} +For compatibility with older versions, Python only warns if it finds +8-bit characters; those warnings should be corrected by either declaring +an explicit encoding, or using escape sequences if those bytes are binary +data, instead of characters. + + +The run-time character set depends on the I/O devices connected to the +program but is generally a superset of \ASCII. + +\strong{Future compatibility note:} It may be tempting to assume that the +character set for 8-bit characters is ISO Latin-1 (an \ASCII{} +superset that covers most western languages that use the Latin +alphabet), but it is possible that in the future Unicode text editors +will become common. These generally use the UTF-8 encoding, which is +also an \ASCII{} superset, but with very different use for the +characters with ordinals 128-255. While there is no consensus on this +subject yet, it is unwise to assume either Latin-1 or UTF-8, even +though the current implementation appears to favor Latin-1. This +applies both to the source character set and the run-time character +set. + + +\section{Line structure\label{line-structure}} + +A Python program is divided into a number of \emph{logical lines}. +\index{line structure} + + +\subsection{Logical lines\label{logical}} + +The end of +a logical line is represented by the token NEWLINE. Statements cannot +cross logical line boundaries except where NEWLINE is allowed by the +syntax (e.g., between statements in compound statements). +A logical line is constructed from one or more \emph{physical lines} +by following the explicit or implicit \emph{line joining} rules. +\index{logical line} +\index{physical line} +\index{line joining} +\index{NEWLINE token} + + +\subsection{Physical lines\label{physical}} + +A physical line is a sequence of characters terminated by an end-of-line +sequence. In source files, any of the standard platform line +termination sequences can be used - the \UNIX{} form using \ASCII{} LF +(linefeed), the Windows form using the \ASCII{} sequence CR LF (return +followed by linefeed), or the Macintosh form using the \ASCII{} CR +(return) character. All of these forms can be used equally, regardless +of platform. + +When embedding Python, source code strings should be passed to Python +APIs using the standard C conventions for newline characters (the +\code{\e n} character, representing \ASCII{} LF, is the line +terminator). + + +\subsection{Comments\label{comments}} + +A comment starts with a hash character (\code{\#}) that is not part of +a string literal, and ends at the end of the physical line. A comment +signifies the end of the logical line unless the implicit line joining +rules are invoked. +Comments are ignored by the syntax; they are not tokens. +\index{comment} +\index{hash character} + + +\subsection{Encoding declarations\label{encodings}} +\index{source character set} +\index{encodings} + +If a comment in the first or second line of the Python script matches +the regular expression \regexp{coding[=:]\e s*([-\e w.]+)}, this comment is +processed as an encoding declaration; the first group of this +expression names the encoding of the source code file. The recommended +forms of this expression are + +\begin{verbatim} +# -*- coding: <encoding-name> -*- +\end{verbatim} + +which is recognized also by GNU Emacs, and + +\begin{verbatim} +# vim:fileencoding=<encoding-name> +\end{verbatim} + +which is recognized by Bram Moolenaar's VIM. In addition, if the first +bytes of the file are the UTF-8 byte-order mark +(\code{'\e xef\e xbb\e xbf'}), the declared file encoding is UTF-8 +(this is supported, among others, by Microsoft's \program{notepad}). + +If an encoding is declared, the encoding name must be recognized by +Python. % XXX there should be a list of supported encodings. +The encoding is used for all lexical analysis, in particular to find +the end of a string, and to interpret the contents of Unicode literals. +String literals are converted to Unicode for syntactical analysis, +then converted back to their original encoding before interpretation +starts. The encoding declaration must appear on a line of its own. + +\subsection{Explicit line joining\label{explicit-joining}} + +Two or more physical lines may be joined into logical lines using +backslash characters (\code{\e}), as follows: when a physical line ends +in a backslash that is not part of a string literal or comment, it is +joined with the following forming a single logical line, deleting the +backslash and the following end-of-line character. For example: +\index{physical line} +\index{line joining} +\index{line continuation} +\index{backslash character} +% +\begin{verbatim} +if 1900 < year < 2100 and 1 <= month <= 12 \ + and 1 <= day <= 31 and 0 <= hour < 24 \ + and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date + return 1 +\end{verbatim} + +A line ending in a backslash cannot carry a comment. A backslash does +not continue a comment. A backslash does not continue a token except +for string literals (i.e., tokens other than string literals cannot be +split across physical lines using a backslash). A backslash is +illegal elsewhere on a line outside a string literal. + + +\subsection{Implicit line joining\label{implicit-joining}} + +Expressions in parentheses, square brackets or curly braces can be +split over more than one physical line without using backslashes. +For example: + +\begin{verbatim} +month_names = ['Januari', 'Februari', 'Maart', # These are the + 'April', 'Mei', 'Juni', # Dutch names + 'Juli', 'Augustus', 'September', # for the months + 'Oktober', 'November', 'December'] # of the year +\end{verbatim} + +Implicitly continued lines can carry comments. The indentation of the +continuation lines is not important. Blank continuation lines are +allowed. There is no NEWLINE token between implicit continuation +lines. Implicitly continued lines can also occur within triple-quoted +strings (see below); in that case they cannot carry comments. + + +\subsection{Blank lines \label{blank-lines}} + +\index{blank line} +A logical line that contains only spaces, tabs, formfeeds and possibly +a comment, is ignored (i.e., no NEWLINE token is generated). During +interactive input of statements, handling of a blank line may differ +depending on the implementation of the read-eval-print loop. In the +standard implementation, an entirely blank logical line (i.e.\ one +containing not even whitespace or a comment) terminates a multi-line +statement. + + +\subsection{Indentation\label{indentation}} + +Leading whitespace (spaces and tabs) at the beginning of a logical +line is used to compute the indentation level of the line, which in +turn is used to determine the grouping of statements. +\index{indentation} +\index{whitespace} +\index{leading whitespace} +\index{space} +\index{tab} +\index{grouping} +\index{statement grouping} + +First, tabs are replaced (from left to right) by one to eight spaces +such that the total number of characters up to and including the +replacement is a multiple of +eight (this is intended to be the same rule as used by \UNIX). The +total number of spaces preceding the first non-blank character then +determines the line's indentation. Indentation cannot be split over +multiple physical lines using backslashes; the whitespace up to the +first backslash determines the indentation. + +\strong{Cross-platform compatibility note:} because of the nature of +text editors on non-UNIX platforms, it is unwise to use a mixture of +spaces and tabs for the indentation in a single source file. It +should also be noted that different platforms may explicitly limit the +maximum indentation level. + +A formfeed character may be present at the start of the line; it will +be ignored for the indentation calculations above. Formfeed +characters occurring elsewhere in the leading whitespace have an +undefined effect (for instance, they may reset the space count to +zero). + +The indentation levels of consecutive lines are used to generate +INDENT and DEDENT tokens, using a stack, as follows. +\index{INDENT token} +\index{DEDENT token} + +Before the first line of the file is read, a single zero is pushed on +the stack; this will never be popped off again. The numbers pushed on +the stack will always be strictly increasing from bottom to top. At +the beginning of each logical line, the line's indentation level is +compared to the top of the stack. If it is equal, nothing happens. +If it is larger, it is pushed on the stack, and one INDENT token is +generated. If it is smaller, it \emph{must} be one of the numbers +occurring on the stack; all numbers on the stack that are larger are +popped off, and for each number popped off a DEDENT token is +generated. At the end of the file, a DEDENT token is generated for +each number remaining on the stack that is larger than zero. + +Here is an example of a correctly (though confusingly) indented piece +of Python code: + +\begin{verbatim} +def perm(l): + # Compute the list of all permutations of l + if len(l) <= 1: + return [l] + r = [] + for i in range(len(l)): + s = l[:i] + l[i+1:] + p = perm(s) + for x in p: + r.append(l[i:i+1] + x) + return r +\end{verbatim} + +The following example shows various indentation errors: + +\begin{verbatim} + def perm(l): # error: first line indented +for i in range(len(l)): # error: not indented + s = l[:i] + l[i+1:] + p = perm(l[:i] + l[i+1:]) # error: unexpected indent + for x in p: + r.append(l[i:i+1] + x) + return r # error: inconsistent dedent +\end{verbatim} + +(Actually, the first three errors are detected by the parser; only the +last error is found by the lexical analyzer --- the indentation of +\code{return r} does not match a level popped off the stack.) + + +\subsection{Whitespace between tokens\label{whitespace}} + +Except at the beginning of a logical line or in string literals, the +whitespace characters space, tab and formfeed can be used +interchangeably to separate tokens. Whitespace is needed between two +tokens only if their concatenation could otherwise be interpreted as a +different token (e.g., ab is one token, but a b is two tokens). + + +\section{Other tokens\label{other-tokens}} + +Besides NEWLINE, INDENT and DEDENT, the following categories of tokens +exist: \emph{identifiers}, \emph{keywords}, \emph{literals}, +\emph{operators}, and \emph{delimiters}. +Whitespace characters (other than line terminators, discussed earlier) +are not tokens, but serve to delimit tokens. +Where +ambiguity exists, a token comprises the longest possible string that +forms a legal token, when read from left to right. + + +\section{Identifiers and keywords\label{identifiers}} + +Identifiers (also referred to as \emph{names}) are described by the following +lexical definitions: +\index{identifier} +\index{name} + +\begin{productionlist} + \production{identifier} + {(\token{letter}|"_") (\token{letter} | \token{digit} | "_")*} + \production{letter} + {\token{lowercase} | \token{uppercase}} + \production{lowercase} + {"a"..."z"} + \production{uppercase} + {"A"..."Z"} + \production{digit} + {"0"..."9"} +\end{productionlist} + +Identifiers are unlimited in length. Case is significant. + + +\subsection{Keywords\label{keywords}} + +The following identifiers are used as reserved words, or +\emph{keywords} of the language, and cannot be used as ordinary +identifiers. They must be spelled exactly as written here:% +\index{keyword}% +\index{reserved word} + +\begin{verbatim} +and del from not while +as elif global or with +assert else if pass yield +break except import print +class exec in raise +continue finally is return +def for lambda try +\end{verbatim} + +% When adding keywords, use reswords.py for reformatting + +\versionchanged[\constant{None} became a constant and is now +recognized by the compiler as a name for the built-in object +\constant{None}. Although it is not a keyword, you cannot assign +a different object to it]{2.4} + +\versionchanged[Both \keyword{as} and \keyword{with} are only recognized +when the \code{with_statement} future feature has been enabled. +It will always be enabled in Python 2.6. See section~\ref{with} for +details. Note that using \keyword{as} and \keyword{with} as identifiers +will always issue a warning, even when the \code{with_statement} future +directive is not in effect]{2.5} + + +\subsection{Reserved classes of identifiers\label{id-classes}} + +Certain classes of identifiers (besides keywords) have special +meanings. These classes are identified by the patterns of leading and +trailing underscore characters: + +\begin{description} + +\item[\code{_*}] + Not imported by \samp{from \var{module} import *}. The special + identifier \samp{_} is used in the interactive interpreter to store + the result of the last evaluation; it is stored in the + \module{__builtin__} module. When not in interactive mode, \samp{_} + has no special meaning and is not defined. + See section~\ref{import}, ``The \keyword{import} statement.'' + + \note{The name \samp{_} is often used in conjunction with + internationalization; refer to the documentation for the + \ulink{\module{gettext} module}{../lib/module-gettext.html} for more + information on this convention.} + +\item[\code{__*__}] + System-defined names. These names are defined by the interpreter + and its implementation (including the standard library); + applications should not expect to define additional names using this + convention. The set of names of this class defined by Python may be + extended in future versions. + See section~\ref{specialnames}, ``Special method names.'' + +\item[\code{__*}] + Class-private names. Names in this category, when used within the + context of a class definition, are re-written to use a mangled form + to help avoid name clashes between ``private'' attributes of base + and derived classes. + See section~\ref{atom-identifiers}, ``Identifiers (Names).'' + +\end{description} + + +\section{Literals\label{literals}} + +Literals are notations for constant values of some built-in types. +\index{literal} +\index{constant} + + +\subsection{String literals\label{strings}} + +String literals are described by the following lexical definitions: +\index{string literal} + +\index{ASCII@\ASCII} +\begin{productionlist} + \production{stringliteral} + {[\token{stringprefix}](\token{shortstring} | \token{longstring})} + \production{stringprefix} + {"r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" | "uR"} + \production{shortstring} + {"'" \token{shortstringitem}* "'" + | '"' \token{shortstringitem}* '"'} + \production{longstring} + {"'''" \token{longstringitem}* "'''"} + \productioncont{| '"""' \token{longstringitem}* '"""'} + \production{shortstringitem} + {\token{shortstringchar} | \token{escapeseq}} + \production{longstringitem} + {\token{longstringchar} | \token{escapeseq}} + \production{shortstringchar} + {<any source character except "\e" or newline or the quote>} + \production{longstringchar} + {<any source character except "\e">} + \production{escapeseq} + {"\e" <any ASCII character>} +\end{productionlist} + +One syntactic restriction not indicated by these productions is that +whitespace is not allowed between the \grammartoken{stringprefix} and +the rest of the string literal. The source character set is defined +by the encoding declaration; it is \ASCII{} if no encoding declaration +is given in the source file; see section~\ref{encodings}. + +\index{triple-quoted string} +\index{Unicode Consortium} +\index{string!Unicode} +In plain English: String literals can be enclosed in matching single +quotes (\code{'}) or double quotes (\code{"}). They can also be +enclosed in matching groups of three single or double quotes (these +are generally referred to as \emph{triple-quoted strings}). The +backslash (\code{\e}) character is used to escape characters that +otherwise have a special meaning, such as newline, backslash itself, +or the quote character. String literals may optionally be prefixed +with a letter \character{r} or \character{R}; such strings are called +\dfn{raw strings}\index{raw string} and use different rules for interpreting +backslash escape sequences. A prefix of \character{u} or \character{U} +makes the string a Unicode string. Unicode strings use the Unicode character +set as defined by the Unicode Consortium and ISO~10646. Some additional +escape sequences, described below, are available in Unicode strings. +The two prefix characters may be combined; in this case, \character{u} must +appear before \character{r}. + +In triple-quoted strings, +unescaped newlines and quotes are allowed (and are retained), except +that three unescaped quotes in a row terminate the string. (A +``quote'' is the character used to open the string, i.e. either +\code{'} or \code{"}.) + +Unless an \character{r} or \character{R} prefix is present, escape +sequences in strings are interpreted according to rules similar +to those used by Standard C. The recognized escape sequences are: +\index{physical line} +\index{escape sequence} +\index{Standard C} +\index{C} + +\begin{tableiii}{l|l|c}{code}{Escape Sequence}{Meaning}{Notes} +\lineiii{\e\var{newline}} {Ignored}{} +\lineiii{\e\e} {Backslash (\code{\e})}{} +\lineiii{\e'} {Single quote (\code{'})}{} +\lineiii{\e"} {Double quote (\code{"})}{} +\lineiii{\e a} {\ASCII{} Bell (BEL)}{} +\lineiii{\e b} {\ASCII{} Backspace (BS)}{} +\lineiii{\e f} {\ASCII{} Formfeed (FF)}{} +\lineiii{\e n} {\ASCII{} Linefeed (LF)}{} +\lineiii{\e N\{\var{name}\}} + {Character named \var{name} in the Unicode database (Unicode only)}{} +\lineiii{\e r} {\ASCII{} Carriage Return (CR)}{} +\lineiii{\e t} {\ASCII{} Horizontal Tab (TAB)}{} +\lineiii{\e u\var{xxxx}} + {Character with 16-bit hex value \var{xxxx} (Unicode only)}{(1)} +\lineiii{\e U\var{xxxxxxxx}} + {Character with 32-bit hex value \var{xxxxxxxx} (Unicode only)}{(2)} +\lineiii{\e v} {\ASCII{} Vertical Tab (VT)}{} +\lineiii{\e\var{ooo}} {Character with octal value \var{ooo}}{(3,5)} +\lineiii{\e x\var{hh}} {Character with hex value \var{hh}}{(4,5)} +\end{tableiii} +\index{ASCII@\ASCII} + +\noindent +Notes: + +\begin{itemize} +\item[(1)] + Individual code units which form parts of a surrogate pair can be + encoded using this escape sequence. +\item[(2)] + Any Unicode character can be encoded this way, but characters + outside the Basic Multilingual Plane (BMP) will be encoded using a + surrogate pair if Python is compiled to use 16-bit code units (the + default). Individual code units which form parts of a surrogate + pair can be encoded using this escape sequence. +\item[(3)] + As in Standard C, up to three octal digits are accepted. +\item[(4)] + Unlike in Standard C, at most two hex digits are accepted. +\item[(5)] + In a string literal, hexadecimal and octal escapes denote the + byte with the given value; it is not necessary that the byte + encodes a character in the source character set. In a Unicode + literal, these escapes denote a Unicode character with the given + value. +\end{itemize} + + +Unlike Standard \index{unrecognized escape sequence}C, +all unrecognized escape sequences are left in the string unchanged, +i.e., \emph{the backslash is left in the string}. (This behavior is +useful when debugging: if an escape sequence is mistyped, the +resulting output is more easily recognized as broken.) It is also +important to note that the escape sequences marked as ``(Unicode +only)'' in the table above fall into the category of unrecognized +escapes for non-Unicode string literals. + +When an \character{r} or \character{R} prefix is present, a character +following a backslash is included in the string without change, and \emph{all +backslashes are left in the string}. For example, the string literal +\code{r"\e n"} consists of two characters: a backslash and a lowercase +\character{n}. String quotes can be escaped with a backslash, but the +backslash remains in the string; for example, \code{r"\e""} is a valid string +literal consisting of two characters: a backslash and a double quote; +\code{r"\e"} is not a valid string literal (even a raw string cannot +end in an odd number of backslashes). Specifically, \emph{a raw +string cannot end in a single backslash} (since the backslash would +escape the following quote character). Note also that a single +backslash followed by a newline is interpreted as those two characters +as part of the string, \emph{not} as a line continuation. + +When an \character{r} or \character{R} prefix is used in conjunction +with a \character{u} or \character{U} prefix, then the \code{\e uXXXX} +and \code{\e UXXXXXXXX} escape sequences are processed while +\emph{all other backslashes are left in the string}. +For example, the string literal +\code{ur"\e{}u0062\e n"} consists of three Unicode characters: `LATIN +SMALL LETTER B', `REVERSE SOLIDUS', and `LATIN SMALL LETTER N'. +Backslashes can be escaped with a preceding backslash; however, both +remain in the string. As a result, \code{\e uXXXX} escape sequences +are only recognized when there are an odd number of backslashes. + +\subsection{String literal concatenation\label{string-catenation}} + +Multiple adjacent string literals (delimited by whitespace), possibly +using different quoting conventions, are allowed, and their meaning is +the same as their concatenation. Thus, \code{"hello" 'world'} is +equivalent to \code{"helloworld"}. This feature can be used to reduce +the number of backslashes needed, to split long strings conveniently +across long lines, or even to add comments to parts of strings, for +example: + +\begin{verbatim} +re.compile("[A-Za-z_]" # letter or underscore + "[A-Za-z0-9_]*" # letter, digit or underscore + ) +\end{verbatim} + +Note that this feature is defined at the syntactical level, but +implemented at compile time. The `+' operator must be used to +concatenate string expressions at run time. Also note that literal +concatenation can use different quoting styles for each component +(even mixing raw strings and triple quoted strings). + + +\subsection{Numeric literals\label{numbers}} + +There are four types of numeric literals: plain integers, long +integers, floating point numbers, and imaginary numbers. There are no +complex literals (complex numbers can be formed by adding a real +number and an imaginary number). +\index{number} +\index{numeric literal} +\index{integer literal} +\index{plain integer literal} +\index{long integer literal} +\index{floating point literal} +\index{hexadecimal literal} +\index{octal literal} +\index{decimal literal} +\index{imaginary literal} +\index{complex!literal} + +Note that numeric literals do not include a sign; a phrase like +\code{-1} is actually an expression composed of the unary operator +`\code{-}' and the literal \code{1}. + + +\subsection{Integer and long integer literals\label{integers}} + +Integer and long integer literals are described by the following +lexical definitions: + +\begin{productionlist} + \production{longinteger} + {\token{integer} ("l" | "L")} + \production{integer} + {\token{decimalinteger} | \token{octinteger} | \token{hexinteger}} + \production{decimalinteger} + {\token{nonzerodigit} \token{digit}* | "0"} + \production{octinteger} + {"0" \token{octdigit}+} + \production{hexinteger} + {"0" ("x" | "X") \token{hexdigit}+} + \production{nonzerodigit} + {"1"..."9"} + \production{octdigit} + {"0"..."7"} + \production{hexdigit} + {\token{digit} | "a"..."f" | "A"..."F"} +\end{productionlist} + +Although both lower case \character{l} and upper case \character{L} are +allowed as suffix for long integers, it is strongly recommended to always +use \character{L}, since the letter \character{l} looks too much like the +digit \character{1}. + +Plain integer literals that are above the largest representable plain +integer (e.g., 2147483647 when using 32-bit arithmetic) are accepted +as if they were long integers instead.\footnote{In versions of Python +prior to 2.4, octal and hexadecimal literals in the range just above +the largest representable plain integer but below the largest unsigned +32-bit number (on a machine using 32-bit arithmetic), 4294967296, were +taken as the negative plain integer obtained by subtracting 4294967296 +from their unsigned value.} There is no limit for long integer +literals apart from what can be stored in available memory. + +Some examples of plain integer literals (first row) and long integer +literals (second and third rows): + +\begin{verbatim} +7 2147483647 0177 +3L 79228162514264337593543950336L 0377L 0x100000000L + 79228162514264337593543950336 0xdeadbeef +\end{verbatim} + + +\subsection{Floating point literals\label{floating}} + +Floating point literals are described by the following lexical +definitions: + +\begin{productionlist} + \production{floatnumber} + {\token{pointfloat} | \token{exponentfloat}} + \production{pointfloat} + {[\token{intpart}] \token{fraction} | \token{intpart} "."} + \production{exponentfloat} + {(\token{intpart} | \token{pointfloat}) + \token{exponent}} + \production{intpart} + {\token{digit}+} + \production{fraction} + {"." \token{digit}+} + \production{exponent} + {("e" | "E") ["+" | "-"] \token{digit}+} +\end{productionlist} + +Note that the integer and exponent parts of floating point numbers +can look like octal integers, but are interpreted using radix 10. For +example, \samp{077e010} is legal, and denotes the same number +as \samp{77e10}. +The allowed range of floating point literals is +implementation-dependent. +Some examples of floating point literals: + +\begin{verbatim} +3.14 10. .001 1e100 3.14e-10 0e0 +\end{verbatim} + +Note that numeric literals do not include a sign; a phrase like +\code{-1} is actually an expression composed of the unary operator +\code{-} and the literal \code{1}. + + +\subsection{Imaginary literals\label{imaginary}} + +Imaginary literals are described by the following lexical definitions: + +\begin{productionlist} + \production{imagnumber}{(\token{floatnumber} | \token{intpart}) ("j" | "J")} +\end{productionlist} + +An imaginary literal yields a complex number with a real part of +0.0. Complex numbers are represented as a pair of floating point +numbers and have the same restrictions on their range. To create a +complex number with a nonzero real part, add a floating point number +to it, e.g., \code{(3+4j)}. Some examples of imaginary literals: + +\begin{verbatim} +3.14j 10.j 10j .001j 1e100j 3.14e-10j +\end{verbatim} + + +\section{Operators\label{operators}} + +The following tokens are operators: +\index{operators} + +\begin{verbatim} ++ - * ** / // % +<< >> & | ^ ~ +< > <= >= == != <> +\end{verbatim} + +The comparison operators \code{<>} and \code{!=} are alternate +spellings of the same operator. \code{!=} is the preferred spelling; +\code{<>} is obsolescent. + + +\section{Delimiters\label{delimiters}} + +The following tokens serve as delimiters in the grammar: +\index{delimiters} + +\begin{verbatim} +( ) [ ] { } @ +, : . ` = ; ++= -= *= /= //= %= +&= |= ^= >>= <<= **= +\end{verbatim} + +The period can also occur in floating-point and imaginary literals. A +sequence of three periods has a special meaning as an ellipsis in slices. +The second half of the list, the augmented assignment operators, serve +lexically as delimiters, but also perform an operation. + +The following printing \ASCII{} characters have special meaning as part +of other tokens or are otherwise significant to the lexical analyzer: + +\begin{verbatim} +' " # \ +\end{verbatim} + +The following printing \ASCII{} characters are not used in Python. Their +occurrence outside string literals and comments is an unconditional +error: +\index{ASCII@\ASCII} + +\begin{verbatim} +$ ? +\end{verbatim} diff --git a/sys/src/cmd/python/Doc/ref/ref3.tex b/sys/src/cmd/python/Doc/ref/ref3.tex new file mode 100644 index 000000000..c5dbfd22d --- /dev/null +++ b/sys/src/cmd/python/Doc/ref/ref3.tex @@ -0,0 +1,2225 @@ +\chapter{Data model\label{datamodel}} + + +\section{Objects, values and types\label{objects}} + +\dfn{Objects} are Python's abstraction for data. All data in a Python +program is represented by objects or by relations between objects. +(In a sense, and in conformance to Von Neumann's model of a +``stored program computer,'' code is also represented by objects.) +\index{object} +\index{data} + +Every object has an identity, a type and a value. An object's +\emph{identity} never changes once it has been created; you may think +of it as the object's address in memory. The `\keyword{is}' operator +compares the identity of two objects; the +\function{id()}\bifuncindex{id} function returns an integer +representing its identity (currently implemented as its address). +An object's \dfn{type} is +also unchangeable.\footnote{Since Python 2.2, a gradual merging of +types and classes has been started that makes this and a few other +assertions made in this manual not 100\% accurate and complete: +for example, it \emph{is} now possible in some cases to change an +object's type, under certain controlled conditions. Until this manual +undergoes extensive revision, it must now be taken as authoritative +only regarding ``classic classes'', that are still the default, for +compatibility purposes, in Python 2.2 and 2.3. For more information, +see \url{http://www.python.org/doc/newstyle.html}.} +An object's type determines the operations that the object +supports (e.g., ``does it have a length?'') and also defines the +possible values for objects of that type. The +\function{type()}\bifuncindex{type} function returns an object's type +(which is an object itself). The \emph{value} of some +objects can change. Objects whose value can change are said to be +\emph{mutable}; objects whose value is unchangeable once they are +created are called \emph{immutable}. +(The value of an immutable container object that contains a reference +to a mutable object can change when the latter's value is changed; +however the container is still considered immutable, because the +collection of objects it contains cannot be changed. So, immutability +is not strictly the same as having an unchangeable value, it is more +subtle.) +An object's mutability is determined by its type; for instance, +numbers, strings and tuples are immutable, while dictionaries and +lists are mutable. +\index{identity of an object} +\index{value of an object} +\index{type of an object} +\index{mutable object} +\index{immutable object} + +Objects are never explicitly destroyed; however, when they become +unreachable they may be garbage-collected. An implementation is +allowed to postpone garbage collection or omit it altogether --- it is +a matter of implementation quality how garbage collection is +implemented, as long as no objects are collected that are still +reachable. (Implementation note: the current implementation uses a +reference-counting scheme with (optional) delayed detection of +cyclically linked garbage, which collects most objects as soon as they +become unreachable, but is not guaranteed to collect garbage +containing circular references. See the +\citetitle[../lib/module-gc.html]{Python Library Reference} for +information on controlling the collection of cyclic garbage.) +\index{garbage collection} +\index{reference counting} +\index{unreachable object} + +Note that the use of the implementation's tracing or debugging +facilities may keep objects alive that would normally be collectable. +Also note that catching an exception with a +`\keyword{try}...\keyword{except}' statement may keep objects alive. + +Some objects contain references to ``external'' resources such as open +files or windows. It is understood that these resources are freed +when the object is garbage-collected, but since garbage collection is +not guaranteed to happen, such objects also provide an explicit way to +release the external resource, usually a \method{close()} method. +Programs are strongly recommended to explicitly close such +objects. The `\keyword{try}...\keyword{finally}' statement provides +a convenient way to do this. + +Some objects contain references to other objects; these are called +\emph{containers}. Examples of containers are tuples, lists and +dictionaries. The references are part of a container's value. In +most cases, when we talk about the value of a container, we imply the +values, not the identities of the contained objects; however, when we +talk about the mutability of a container, only the identities of +the immediately contained objects are implied. So, if an immutable +container (like a tuple) +contains a reference to a mutable object, its value changes +if that mutable object is changed. +\index{container} + +Types affect almost all aspects of object behavior. Even the importance +of object identity is affected in some sense: for immutable types, +operations that compute new values may actually return a reference to +any existing object with the same type and value, while for mutable +objects this is not allowed. E.g., after +\samp{a = 1; b = 1}, +\code{a} and \code{b} may or may not refer to the same object with the +value one, depending on the implementation, but after +\samp{c = []; d = []}, \code{c} and \code{d} +are guaranteed to refer to two different, unique, newly created empty +lists. +(Note that \samp{c = d = []} assigns the same object to both +\code{c} and \code{d}.) + + +\section{The standard type hierarchy\label{types}} + +Below is a list of the types that are built into Python. Extension +modules (written in C, Java, or other languages, depending on +the implementation) can define additional types. Future versions of +Python may add types to the type hierarchy (e.g., rational +numbers, efficiently stored arrays of integers, etc.). +\index{type} +\indexii{data}{type} +\indexii{type}{hierarchy} +\indexii{extension}{module} +\indexii{C}{language} + +Some of the type descriptions below contain a paragraph listing +`special attributes.' These are attributes that provide access to the +implementation and are not intended for general use. Their definition +may change in the future. +\index{attribute} +\indexii{special}{attribute} +\indexiii{generic}{special}{attribute} + +\begin{description} + +\item[None] +This type has a single value. There is a single object with this value. +This object is accessed through the built-in name \code{None}. +It is used to signify the absence of a value in many situations, e.g., +it is returned from functions that don't explicitly return anything. +Its truth value is false. +\obindex{None} + +\item[NotImplemented] +This type has a single value. There is a single object with this value. +This object is accessed through the built-in name \code{NotImplemented}. +Numeric methods and rich comparison methods may return this value if +they do not implement the operation for the operands provided. (The +interpreter will then try the reflected operation, or some other +fallback, depending on the operator.) Its truth value is true. +\obindex{NotImplemented} + +\item[Ellipsis] +This type has a single value. There is a single object with this value. +This object is accessed through the built-in name \code{Ellipsis}. +It is used to indicate the presence of the \samp{...} syntax in a +slice. Its truth value is true. +\obindex{Ellipsis} + +\item[Numbers] +These are created by numeric literals and returned as results by +arithmetic operators and arithmetic built-in functions. Numeric +objects are immutable; once created their value never changes. Python +numbers are of course strongly related to mathematical numbers, but +subject to the limitations of numerical representation in computers. +\obindex{numeric} + +Python distinguishes between integers, floating point numbers, and +complex numbers: + +\begin{description} +\item[Integers] +These represent elements from the mathematical set of integers +(positive and negative). +\obindex{integer} + +There are three types of integers: + +\begin{description} + +\item[Plain integers] +These represent numbers in the range -2147483648 through 2147483647. +(The range may be larger on machines with a larger natural word +size, but not smaller.) +When the result of an operation would fall outside this range, the +result is normally returned as a long integer (in some cases, the +exception \exception{OverflowError} is raised instead). +For the purpose of shift and mask operations, integers are assumed to +have a binary, 2's complement notation using 32 or more bits, and +hiding no bits from the user (i.e., all 4294967296 different bit +patterns correspond to different values). +\obindex{plain integer} +\withsubitem{(built-in exception)}{\ttindex{OverflowError}} + +\item[Long integers] +These represent numbers in an unlimited range, subject to available +(virtual) memory only. For the purpose of shift and mask operations, +a binary representation is assumed, and negative numbers are +represented in a variant of 2's complement which gives the illusion of +an infinite string of sign bits extending to the left. +\obindex{long integer} + +\item[Booleans] +These represent the truth values False and True. The two objects +representing the values False and True are the only Boolean objects. +The Boolean type is a subtype of plain integers, and Boolean values +behave like the values 0 and 1, respectively, in almost all contexts, +the exception being that when converted to a string, the strings +\code{"False"} or \code{"True"} are returned, respectively. +\obindex{Boolean} +\ttindex{False} +\ttindex{True} + +\end{description} % Integers + +The rules for integer representation are intended to give the most +meaningful interpretation of shift and mask operations involving +negative integers and the least surprises when switching between the +plain and long integer domains. Any operation except left shift, +if it yields a result in the plain integer domain without causing +overflow, will yield the same result in the long integer domain or +when using mixed operands. +\indexii{integer}{representation} + +\item[Floating point numbers] +These represent machine-level double precision floating point numbers. +You are at the mercy of the underlying machine architecture (and +C or Java implementation) for the accepted range and handling of overflow. +Python does not support single-precision floating point numbers; the +savings in processor and memory usage that are usually the reason for using +these is dwarfed by the overhead of using objects in Python, so there +is no reason to complicate the language with two kinds of floating +point numbers. +\obindex{floating point} +\indexii{floating point}{number} +\indexii{C}{language} +\indexii{Java}{language} + +\item[Complex numbers] +These represent complex numbers as a pair of machine-level double +precision floating point numbers. The same caveats apply as for +floating point numbers. The real and imaginary parts of a complex +number \code{z} can be retrieved through the read-only attributes +\code{z.real} and \code{z.imag}. +\obindex{complex} +\indexii{complex}{number} + +\end{description} % Numbers + + +\item[Sequences] +These represent finite ordered sets indexed by non-negative numbers. +The built-in function \function{len()}\bifuncindex{len} returns the +number of items of a sequence. +When the length of a sequence is \var{n}, the +index set contains the numbers 0, 1, \ldots, \var{n}-1. Item +\var{i} of sequence \var{a} is selected by \code{\var{a}[\var{i}]}. +\obindex{sequence} +\index{index operation} +\index{item selection} +\index{subscription} + +Sequences also support slicing: \code{\var{a}[\var{i}:\var{j}]} +selects all items with index \var{k} such that \var{i} \code{<=} +\var{k} \code{<} \var{j}. When used as an expression, a slice is a +sequence of the same type. This implies that the index set is +renumbered so that it starts at 0. +\index{slicing} + +Some sequences also support ``extended slicing'' with a third ``step'' +parameter: \code{\var{a}[\var{i}:\var{j}:\var{k}]} selects all items +of \var{a} with index \var{x} where \code{\var{x} = \var{i} + +\var{n}*\var{k}}, \var{n} \code{>=} \code{0} and \var{i} \code{<=} +\var{x} \code{<} \var{j}. +\index{extended slicing} + +Sequences are distinguished according to their mutability: + +\begin{description} + +\item[Immutable sequences] +An object of an immutable sequence type cannot change once it is +created. (If the object contains references to other objects, +these other objects may be mutable and may be changed; however, +the collection of objects directly referenced by an immutable object +cannot change.) +\obindex{immutable sequence} +\obindex{immutable} + +The following types are immutable sequences: + +\begin{description} + +\item[Strings] +The items of a string are characters. There is no separate +character type; a character is represented by a string of one item. +Characters represent (at least) 8-bit bytes. The built-in +functions \function{chr()}\bifuncindex{chr} and +\function{ord()}\bifuncindex{ord} convert between characters and +nonnegative integers representing the byte values. Bytes with the +values 0-127 usually represent the corresponding \ASCII{} values, but +the interpretation of values is up to the program. The string +data type is also used to represent arrays of bytes, e.g., to hold data +read from a file. +\obindex{string} +\index{character} +\index{byte} +\index{ASCII@\ASCII} + +(On systems whose native character set is not \ASCII, strings may use +EBCDIC in their internal representation, provided the functions +\function{chr()} and \function{ord()} implement a mapping between \ASCII{} and +EBCDIC, and string comparison preserves the \ASCII{} order. +Or perhaps someone can propose a better rule?) +\index{ASCII@\ASCII} +\index{EBCDIC} +\index{character set} +\indexii{string}{comparison} +\bifuncindex{chr} +\bifuncindex{ord} + +\item[Unicode] +The items of a Unicode object are Unicode code units. A Unicode code +unit is represented by a Unicode object of one item and can hold +either a 16-bit or 32-bit value representing a Unicode ordinal (the +maximum value for the ordinal is given in \code{sys.maxunicode}, and +depends on how Python is configured at compile time). Surrogate pairs +may be present in the Unicode object, and will be reported as two +separate items. The built-in functions +\function{unichr()}\bifuncindex{unichr} and +\function{ord()}\bifuncindex{ord} convert between code units and +nonnegative integers representing the Unicode ordinals as defined in +the Unicode Standard 3.0. Conversion from and to other encodings are +possible through the Unicode method \method{encode()} and the built-in +function \function{unicode()}.\bifuncindex{unicode} +\obindex{unicode} +\index{character} +\index{integer} +\index{Unicode} + +\item[Tuples] +The items of a tuple are arbitrary Python objects. +Tuples of two or more items are formed by comma-separated lists +of expressions. A tuple of one item (a `singleton') can be formed +by affixing a comma to an expression (an expression by itself does +not create a tuple, since parentheses must be usable for grouping of +expressions). An empty tuple can be formed by an empty pair of +parentheses. +\obindex{tuple} +\indexii{singleton}{tuple} +\indexii{empty}{tuple} + +\end{description} % Immutable sequences + +\item[Mutable sequences] +Mutable sequences can be changed after they are created. The +subscription and slicing notations can be used as the target of +assignment and \keyword{del} (delete) statements. +\obindex{mutable sequence} +\obindex{mutable} +\indexii{assignment}{statement} +\index{delete} +\stindex{del} +\index{subscription} +\index{slicing} + +There is currently a single intrinsic mutable sequence type: + +\begin{description} + +\item[Lists] +The items of a list are arbitrary Python objects. Lists are formed +by placing a comma-separated list of expressions in square brackets. +(Note that there are no special cases needed to form lists of length 0 +or 1.) +\obindex{list} + +\end{description} % Mutable sequences + +The extension module \module{array}\refstmodindex{array} provides an +additional example of a mutable sequence type. + + +\end{description} % Sequences + + +\item[Set types] +These represent unordered, finite sets of unique, immutable objects. +As such, they cannot be indexed by any subscript. However, they can be +iterated over, and the built-in function \function{len()} returns the +number of items in a set. Common uses for sets are +fast membership testing, removing duplicates from a sequence, and +computing mathematical operations such as intersection, union, difference, +and symmetric difference. +\bifuncindex{len} +\obindex{set type} + +For set elements, the same immutability rules apply as for dictionary +keys. Note that numeric types obey the normal rules for numeric +comparison: if two numbers compare equal (e.g., \code{1} and +\code{1.0}), only one of them can be contained in a set. + +There are currently two intrinsic set types: + +\begin{description} + +\item[Sets] +These\obindex{set} represent a mutable set. They are created by the +built-in \function{set()} constructor and can be modified afterwards +by several methods, such as \method{add()}. + +\item[Frozen sets] +These\obindex{frozenset} represent an immutable set. They are created by +the built-in \function{frozenset()} constructor. As a frozenset is +immutable and hashable, it can be used again as an element of another set, +or as a dictionary key. + +\end{description} % Set types + + +\item[Mappings] +These represent finite sets of objects indexed by arbitrary index sets. +The subscript notation \code{a[k]} selects the item indexed +by \code{k} from the mapping \code{a}; this can be used in +expressions and as the target of assignments or \keyword{del} statements. +The built-in function \function{len()} returns the number of items +in a mapping. +\bifuncindex{len} +\index{subscription} +\obindex{mapping} + +There is currently a single intrinsic mapping type: + +\begin{description} + +\item[Dictionaries] +These\obindex{dictionary} represent finite sets of objects indexed by +nearly arbitrary values. The only types of values not acceptable as +keys are values containing lists or dictionaries or other mutable +types that are compared by value rather than by object identity, the +reason being that the efficient implementation of dictionaries +requires a key's hash value to remain constant. +Numeric types used for keys obey the normal rules for numeric +comparison: if two numbers compare equal (e.g., \code{1} and +\code{1.0}) then they can be used interchangeably to index the same +dictionary entry. + +Dictionaries are mutable; they can be created by the +\code{\{...\}} notation (see section~\ref{dict}, ``Dictionary +Displays''). + +The extension modules \module{dbm}\refstmodindex{dbm}, +\module{gdbm}\refstmodindex{gdbm}, and +\module{bsddb}\refstmodindex{bsddb} provide additional examples of +mapping types. + +\end{description} % Mapping types + +\item[Callable types] +These\obindex{callable} are the types to which the function call +operation (see section~\ref{calls}, ``Calls'') can be applied: +\indexii{function}{call} +\index{invocation} +\indexii{function}{argument} + +\begin{description} + +\item[User-defined functions] +A user-defined function object is created by a function definition +(see section~\ref{function}, ``Function definitions''). It should be +called with an argument +list containing the same number of items as the function's formal +parameter list. +\indexii{user-defined}{function} +\obindex{function} +\obindex{user-defined function} + +Special attributes: + +\begin{tableiii}{lll}{member}{Attribute}{Meaning}{} + \lineiii{func_doc}{The function's documentation string, or + \code{None} if unavailable}{Writable} + + \lineiii{__doc__}{Another way of spelling + \member{func_doc}}{Writable} + + \lineiii{func_name}{The function's name}{Writable} + + \lineiii{__name__}{Another way of spelling + \member{func_name}}{Writable} + + \lineiii{__module__}{The name of the module the function was defined + in, or \code{None} if unavailable.}{Writable} + + \lineiii{func_defaults}{A tuple containing default argument values + for those arguments that have defaults, or \code{None} if no + arguments have a default value}{Writable} + + \lineiii{func_code}{The code object representing the compiled + function body.}{Writable} + + \lineiii{func_globals}{A reference to the dictionary that holds the + function's global variables --- the global namespace of the module + in which the function was defined.}{Read-only} + + \lineiii{func_dict}{The namespace supporting arbitrary function + attributes.}{Writable} + + \lineiii{func_closure}{\code{None} or a tuple of cells that contain + bindings for the function's free variables.}{Read-only} +\end{tableiii} + +Most of the attributes labelled ``Writable'' check the type of the +assigned value. + +\versionchanged[\code{func_name} is now writable]{2.4} + +Function objects also support getting and setting arbitrary +attributes, which can be used, for example, to attach metadata to +functions. Regular attribute dot-notation is used to get and set such +attributes. \emph{Note that the current implementation only supports +function attributes on user-defined functions. Function attributes on +built-in functions may be supported in the future.} + +Additional information about a function's definition can be retrieved +from its code object; see the description of internal types below. + +\withsubitem{(function attribute)}{ + \ttindex{func_doc} + \ttindex{__doc__} + \ttindex{__name__} + \ttindex{__module__} + \ttindex{__dict__} + \ttindex{func_defaults} + \ttindex{func_closure} + \ttindex{func_code} + \ttindex{func_globals} + \ttindex{func_dict}} +\indexii{global}{namespace} + +\item[User-defined methods] +A user-defined method object combines a class, a class instance (or +\code{None}) and any callable object (normally a user-defined +function). +\obindex{method} +\obindex{user-defined method} +\indexii{user-defined}{method} + +Special read-only attributes: \member{im_self} is the class instance +object, \member{im_func} is the function object; +\member{im_class} is the class of \member{im_self} for bound methods +or the class that asked for the method for unbound methods; +\member{__doc__} is the method's documentation (same as +\code{im_func.__doc__}); \member{__name__} is the method name (same as +\code{im_func.__name__}); \member{__module__} is the name of the +module the method was defined in, or \code{None} if unavailable. +\versionchanged[\member{im_self} used to refer to the class that + defined the method]{2.2} +\withsubitem{(method attribute)}{ + \ttindex{__doc__} + \ttindex{__name__} + \ttindex{__module__} + \ttindex{im_func} + \ttindex{im_self}} + +Methods also support accessing (but not setting) the arbitrary +function attributes on the underlying function object. + +User-defined method objects may be created when getting an attribute +of a class (perhaps via an instance of that class), if that attribute +is a user-defined function object, an unbound user-defined method object, +or a class method object. +When the attribute is a user-defined method object, a new +method object is only created if the class from which it is being +retrieved is the same as, or a derived class of, the class stored +in the original method object; otherwise, the original method object +is used as it is. + +When a user-defined method object is created by retrieving +a user-defined function object from a class, its \member{im_self} +attribute is \code{None} and the method object is said to be unbound. +When one is created by retrieving a user-defined function object +from a class via one of its instances, its \member{im_self} attribute +is the instance, and the method object is said to be bound. +In either case, the new method's \member{im_class} attribute +is the class from which the retrieval takes place, and +its \member{im_func} attribute is the original function object. +\withsubitem{(method attribute)}{ + \ttindex{im_class}\ttindex{im_func}\ttindex{im_self}} + +When a user-defined method object is created by retrieving another +method object from a class or instance, the behaviour is the same +as for a function object, except that the \member{im_func} attribute +of the new instance is not the original method object but its +\member{im_func} attribute. +\withsubitem{(method attribute)}{ + \ttindex{im_func}} + +When a user-defined method object is created by retrieving a +class method object from a class or instance, its \member{im_self} +attribute is the class itself (the same as the \member{im_class} +attribute), and its \member{im_func} attribute is the function +object underlying the class method. +\withsubitem{(method attribute)}{ + \ttindex{im_class}\ttindex{im_func}\ttindex{im_self}} + +When an unbound user-defined method object is called, the underlying +function (\member{im_func}) is called, with the restriction that the +first argument must be an instance of the proper class +(\member{im_class}) or of a derived class thereof. + +When a bound user-defined method object is called, the underlying +function (\member{im_func}) is called, inserting the class instance +(\member{im_self}) in front of the argument list. For instance, when +\class{C} is a class which contains a definition for a function +\method{f()}, and \code{x} is an instance of \class{C}, calling +\code{x.f(1)} is equivalent to calling \code{C.f(x, 1)}. + +When a user-defined method object is derived from a class method object, +the ``class instance'' stored in \member{im_self} will actually be the +class itself, so that calling either \code{x.f(1)} or \code{C.f(1)} is +equivalent to calling \code{f(C,1)} where \code{f} is the underlying +function. + +Note that the transformation from function object to (unbound or +bound) method object happens each time the attribute is retrieved from +the class or instance. In some cases, a fruitful optimization is to +assign the attribute to a local variable and call that local variable. +Also notice that this transformation only happens for user-defined +functions; other callable objects (and all non-callable objects) are +retrieved without transformation. It is also important to note that +user-defined functions which are attributes of a class instance are +not converted to bound methods; this \emph{only} happens when the +function is an attribute of the class. + +\item[Generator functions\index{generator!function}\index{generator!iterator}] +A function or method which uses the \keyword{yield} statement (see +section~\ref{yield}, ``The \keyword{yield} statement'') is called a +\dfn{generator function}. Such a function, when called, always +returns an iterator object which can be used to execute the body of +the function: calling the iterator's \method{next()} method will +cause the function to execute until it provides a value using the +\keyword{yield} statement. When the function executes a +\keyword{return} statement or falls off the end, a +\exception{StopIteration} exception is raised and the iterator will +have reached the end of the set of values to be returned. + +\item[Built-in functions] +A built-in function object is a wrapper around a C function. Examples +of built-in functions are \function{len()} and \function{math.sin()} +(\module{math} is a standard built-in module). +The number and type of the arguments are +determined by the C function. +Special read-only attributes: \member{__doc__} is the function's +documentation string, or \code{None} if unavailable; \member{__name__} +is the function's name; \member{__self__} is set to \code{None} (but see +the next item); \member{__module__} is the name of the module the +function was defined in or \code{None} if unavailable. +\obindex{built-in function} +\obindex{function} +\indexii{C}{language} + +\item[Built-in methods] +This is really a different disguise of a built-in function, this time +containing an object passed to the C function as an implicit extra +argument. An example of a built-in method is +\code{\var{alist}.append()}, assuming +\var{alist} is a list object. +In this case, the special read-only attribute \member{__self__} is set +to the object denoted by \var{list}. +\obindex{built-in method} +\obindex{method} +\indexii{built-in}{method} + +\item[Class Types] +Class types, or ``new-style classes,'' are callable. These objects +normally act as factories for new instances of themselves, but +variations are possible for class types that override +\method{__new__()}. The arguments of the call are passed to +\method{__new__()} and, in the typical case, to \method{__init__()} to +initialize the new instance. + +\item[Classic Classes] +Class objects are described below. When a class object is called, +a new class instance (also described below) is created and +returned. This implies a call to the class's \method{__init__()} method +if it has one. Any arguments are passed on to the \method{__init__()} +method. If there is no \method{__init__()} method, the class must be called +without arguments. +\withsubitem{(object method)}{\ttindex{__init__()}} +\obindex{class} +\obindex{class instance} +\obindex{instance} +\indexii{class object}{call} + +\item[Class instances] +Class instances are described below. Class instances are callable +only when the class has a \method{__call__()} method; \code{x(arguments)} +is a shorthand for \code{x.__call__(arguments)}. + +\end{description} + +\item[Modules] +Modules are imported by the \keyword{import} statement (see +section~\ref{import}, ``The \keyword{import} statement'').% +\stindex{import}\obindex{module} +A module object has a namespace implemented by a dictionary object +(this is the dictionary referenced by the func_globals attribute of +functions defined in the module). Attribute references are translated +to lookups in this dictionary, e.g., \code{m.x} is equivalent to +\code{m.__dict__["x"]}. +A module object does not contain the code object used to +initialize the module (since it isn't needed once the initialization +is done). + +Attribute assignment updates the module's namespace dictionary, +e.g., \samp{m.x = 1} is equivalent to \samp{m.__dict__["x"] = 1}. + +Special read-only attribute: \member{__dict__} is the module's +namespace as a dictionary object. +\withsubitem{(module attribute)}{\ttindex{__dict__}} + +Predefined (writable) attributes: \member{__name__} +is the module's name; \member{__doc__} is the +module's documentation string, or +\code{None} if unavailable; \member{__file__} is the pathname of the +file from which the module was loaded, if it was loaded from a file. +The \member{__file__} attribute is not present for C{} modules that are +statically linked into the interpreter; for extension modules loaded +dynamically from a shared library, it is the pathname of the shared +library file. +\withsubitem{(module attribute)}{ + \ttindex{__name__} + \ttindex{__doc__} + \ttindex{__file__}} +\indexii{module}{namespace} + +\item[Classes] +Class objects are created by class definitions (see +section~\ref{class}, ``Class definitions''). +A class has a namespace implemented by a dictionary object. +Class attribute references are translated to +lookups in this dictionary, +e.g., \samp{C.x} is translated to \samp{C.__dict__["x"]}. +When the attribute name is not found +there, the attribute search continues in the base classes. The search +is depth-first, left-to-right in the order of occurrence in the +base class list. + +When a class attribute reference (for class \class{C}, say) +would yield a user-defined function object or +an unbound user-defined method object whose associated class is either +\class{C} or one of its base classes, it is transformed into an unbound +user-defined method object whose \member{im_class} attribute is~\class{C}. +When it would yield a class method object, it is transformed into +a bound user-defined method object whose \member{im_class} and +\member{im_self} attributes are both~\class{C}. When it would yield +a static method object, it is transformed into the object wrapped +by the static method object. See section~\ref{descriptors} for another +way in which attributes retrieved from a class may differ from those +actually contained in its \member{__dict__}. +\obindex{class} +\obindex{class instance} +\obindex{instance} +\indexii{class object}{call} +\index{container} +\obindex{dictionary} +\indexii{class}{attribute} + +Class attribute assignments update the class's dictionary, never the +dictionary of a base class. +\indexiii{class}{attribute}{assignment} + +A class object can be called (see above) to yield a class instance (see +below). +\indexii{class object}{call} + +Special attributes: \member{__name__} is the class name; +\member{__module__} is the module name in which the class was defined; +\member{__dict__} is the dictionary containing the class's namespace; +\member{__bases__} is a tuple (possibly empty or a singleton) +containing the base classes, in the order of their occurrence in the +base class list; \member{__doc__} is the class's documentation string, +or None if undefined. +\withsubitem{(class attribute)}{ + \ttindex{__name__} + \ttindex{__module__} + \ttindex{__dict__} + \ttindex{__bases__} + \ttindex{__doc__}} + +\item[Class instances] +A class instance is created by calling a class object (see above). +A class instance has a namespace implemented as a dictionary which +is the first place in which +attribute references are searched. When an attribute is not found +there, and the instance's class has an attribute by that name, +the search continues with the class attributes. If a class attribute +is found that is a user-defined function object or an unbound +user-defined method object whose associated class is the class +(call it~\class{C}) of the instance for which the attribute reference +was initiated or one of its bases, +it is transformed into a bound user-defined method object whose +\member{im_class} attribute is~\class{C} and whose \member{im_self} attribute +is the instance. Static method and class method objects are also +transformed, as if they had been retrieved from class~\class{C}; +see above under ``Classes''. See section~\ref{descriptors} for +another way in which attributes of a class retrieved via its +instances may differ from the objects actually stored in the +class's \member{__dict__}. +If no class attribute is found, and the object's class has a +\method{__getattr__()} method, that is called to satisfy the lookup. +\obindex{class instance} +\obindex{instance} +\indexii{class}{instance} +\indexii{class instance}{attribute} + +Attribute assignments and deletions update the instance's dictionary, +never a class's dictionary. If the class has a \method{__setattr__()} or +\method{__delattr__()} method, this is called instead of updating the +instance dictionary directly. +\indexiii{class instance}{attribute}{assignment} + +Class instances can pretend to be numbers, sequences, or mappings if +they have methods with certain special names. See +section~\ref{specialnames}, ``Special method names.'' +\obindex{numeric} +\obindex{sequence} +\obindex{mapping} + +Special attributes: \member{__dict__} is the attribute +dictionary; \member{__class__} is the instance's class. +\withsubitem{(instance attribute)}{ + \ttindex{__dict__} + \ttindex{__class__}} + +\item[Files] +A file\obindex{file} object represents an open file. File objects are +created by the \function{open()}\bifuncindex{open} built-in function, +and also by +\withsubitem{(in module os)}{\ttindex{popen()}}\function{os.popen()}, +\function{os.fdopen()}, and the +\method{makefile()}\withsubitem{(socket method)}{\ttindex{makefile()}} +method of socket objects (and perhaps by other functions or methods +provided by extension modules). The objects +\ttindex{sys.stdin}\code{sys.stdin}, +\ttindex{sys.stdout}\code{sys.stdout} and +\ttindex{sys.stderr}\code{sys.stderr} are initialized to file objects +corresponding to the interpreter's standard\index{stdio} input, output +and error streams. See the \citetitle[../lib/lib.html]{Python Library +Reference} for complete documentation of file objects. +\withsubitem{(in module sys)}{ + \ttindex{stdin} + \ttindex{stdout} + \ttindex{stderr}} + + +\item[Internal types] +A few types used internally by the interpreter are exposed to the user. +Their definitions may change with future versions of the interpreter, +but they are mentioned here for completeness. +\index{internal type} +\index{types, internal} + +\begin{description} + +\item[Code objects] +Code objects represent \emph{byte-compiled} executable Python code, or +\emph{bytecode}. +The difference between a code +object and a function object is that the function object contains an +explicit reference to the function's globals (the module in which it +was defined), while a code object contains no context; +also the default argument values are stored in the function object, +not in the code object (because they represent values calculated at +run-time). Unlike function objects, code objects are immutable and +contain no references (directly or indirectly) to mutable objects. +\index{bytecode} +\obindex{code} + +Special read-only attributes: \member{co_name} gives the function +name; \member{co_argcount} is the number of positional arguments +(including arguments with default values); \member{co_nlocals} is the +number of local variables used by the function (including arguments); +\member{co_varnames} is a tuple containing the names of the local +variables (starting with the argument names); \member{co_cellvars} is +a tuple containing the names of local variables that are referenced by +nested functions; \member{co_freevars} is a tuple containing the names +of free variables; \member{co_code} is a string representing the +sequence of bytecode instructions; +\member{co_consts} is a tuple containing the literals used by the +bytecode; \member{co_names} is a tuple containing the names used by +the bytecode; \member{co_filename} is the filename from which the code +was compiled; \member{co_firstlineno} is the first line number of the +function; \member{co_lnotab} is a string encoding the mapping from +byte code offsets to line numbers (for details see the source code of +the interpreter); \member{co_stacksize} is the required stack size +(including local variables); \member{co_flags} is an integer encoding +a number of flags for the interpreter. + +\withsubitem{(code object attribute)}{ + \ttindex{co_argcount} + \ttindex{co_code} + \ttindex{co_consts} + \ttindex{co_filename} + \ttindex{co_firstlineno} + \ttindex{co_flags} + \ttindex{co_lnotab} + \ttindex{co_name} + \ttindex{co_names} + \ttindex{co_nlocals} + \ttindex{co_stacksize} + \ttindex{co_varnames} + \ttindex{co_cellvars} + \ttindex{co_freevars}} + +The following flag bits are defined for \member{co_flags}: bit +\code{0x04} is set if the function uses the \samp{*arguments} syntax +to accept an arbitrary number of positional arguments; bit +\code{0x08} is set if the function uses the \samp{**keywords} syntax +to accept arbitrary keyword arguments; bit \code{0x20} is set if the +function is a generator. +\obindex{generator} + +Future feature declarations (\samp{from __future__ import division}) +also use bits in \member{co_flags} to indicate whether a code object +was compiled with a particular feature enabled: bit \code{0x2000} is +set if the function was compiled with future division enabled; bits +\code{0x10} and \code{0x1000} were used in earlier versions of Python. + +Other bits in \member{co_flags} are reserved for internal use. + +If\index{documentation string} a code object represents a function, +the first item in +\member{co_consts} is the documentation string of the function, or +\code{None} if undefined. + +\item[Frame objects] +Frame objects represent execution frames. They may occur in traceback +objects (see below). +\obindex{frame} + +Special read-only attributes: \member{f_back} is to the previous +stack frame (towards the caller), or \code{None} if this is the bottom +stack frame; \member{f_code} is the code object being executed in this +frame; \member{f_locals} is the dictionary used to look up local +variables; \member{f_globals} is used for global variables; +\member{f_builtins} is used for built-in (intrinsic) names; +\member{f_restricted} is a flag indicating whether the function is +executing in restricted execution mode; \member{f_lasti} gives the +precise instruction (this is an index into the bytecode string of +the code object). +\withsubitem{(frame attribute)}{ + \ttindex{f_back} + \ttindex{f_code} + \ttindex{f_globals} + \ttindex{f_locals} + \ttindex{f_lasti} + \ttindex{f_builtins} + \ttindex{f_restricted}} + +Special writable attributes: \member{f_trace}, if not \code{None}, is +a function called at the start of each source code line (this is used +by the debugger); \member{f_exc_type}, \member{f_exc_value}, +\member{f_exc_traceback} represent the last exception raised in the +parent frame provided another exception was ever raised in the current +frame (in all other cases they are None); \member{f_lineno} is the +current line number of the frame --- writing to this from within a +trace function jumps to the given line (only for the bottom-most +frame). A debugger can implement a Jump command (aka Set Next +Statement) by writing to f_lineno. +\withsubitem{(frame attribute)}{ + \ttindex{f_trace} + \ttindex{f_exc_type} + \ttindex{f_exc_value} + \ttindex{f_exc_traceback} + \ttindex{f_lineno}} + +\item[Traceback objects] \label{traceback} +Traceback objects represent a stack trace of an exception. A +traceback object is created when an exception occurs. When the search +for an exception handler unwinds the execution stack, at each unwound +level a traceback object is inserted in front of the current +traceback. When an exception handler is entered, the stack trace is +made available to the program. +(See section~\ref{try}, ``The \code{try} statement.'') +It is accessible as \code{sys.exc_traceback}, and also as the third +item of the tuple returned by \code{sys.exc_info()}. The latter is +the preferred interface, since it works correctly when the program is +using multiple threads. +When the program contains no suitable handler, the stack trace is written +(nicely formatted) to the standard error stream; if the interpreter is +interactive, it is also made available to the user as +\code{sys.last_traceback}. +\obindex{traceback} +\indexii{stack}{trace} +\indexii{exception}{handler} +\indexii{execution}{stack} +\withsubitem{(in module sys)}{ + \ttindex{exc_info} + \ttindex{exc_traceback} + \ttindex{last_traceback}} +\ttindex{sys.exc_info} +\ttindex{sys.exc_traceback} +\ttindex{sys.last_traceback} + +Special read-only attributes: \member{tb_next} is the next level in the +stack trace (towards the frame where the exception occurred), or +\code{None} if there is no next level; \member{tb_frame} points to the +execution frame of the current level; \member{tb_lineno} gives the line +number where the exception occurred; \member{tb_lasti} indicates the +precise instruction. The line number and last instruction in the +traceback may differ from the line number of its frame object if the +exception occurred in a \keyword{try} statement with no matching +except clause or with a finally clause. +\withsubitem{(traceback attribute)}{ + \ttindex{tb_next} + \ttindex{tb_frame} + \ttindex{tb_lineno} + \ttindex{tb_lasti}} +\stindex{try} + +\item[Slice objects] +Slice objects are used to represent slices when \emph{extended slice +syntax} is used. This is a slice using two colons, or multiple slices +or ellipses separated by commas, e.g., \code{a[i:j:step]}, \code{a[i:j, +k:l]}, or \code{a[..., i:j]}. They are also created by the built-in +\function{slice()}\bifuncindex{slice} function. + +Special read-only attributes: \member{start} is the lower bound; +\member{stop} is the upper bound; \member{step} is the step value; each is +\code{None} if omitted. These attributes can have any type. +\withsubitem{(slice object attribute)}{ + \ttindex{start} + \ttindex{stop} + \ttindex{step}} + +Slice objects support one method: + +\begin{methoddesc}[slice]{indices}{self, length} +This method takes a single integer argument \var{length} and computes +information about the extended slice that the slice object would +describe if applied to a sequence of \var{length} items. It returns a +tuple of three integers; respectively these are the \var{start} and +\var{stop} indices and the \var{step} or stride length of the slice. +Missing or out-of-bounds indices are handled in a manner consistent +with regular slices. +\versionadded{2.3} +\end{methoddesc} + +\item[Static method objects] +Static method objects provide a way of defeating the transformation +of function objects to method objects described above. A static method +object is a wrapper around any other object, usually a user-defined +method object. When a static method object is retrieved from a class +or a class instance, the object actually returned is the wrapped object, +which is not subject to any further transformation. Static method +objects are not themselves callable, although the objects they +wrap usually are. Static method objects are created by the built-in +\function{staticmethod()} constructor. + +\item[Class method objects] +A class method object, like a static method object, is a wrapper +around another object that alters the way in which that object +is retrieved from classes and class instances. The behaviour of +class method objects upon such retrieval is described above, +under ``User-defined methods''. Class method objects are created +by the built-in \function{classmethod()} constructor. + +\end{description} % Internal types + +\end{description} % Types + +%========================================================================= +\section{New-style and classic classes} + +Classes and instances come in two flavors: old-style or classic, and new-style. + +Up to Python 2.1, old-style classes were the only flavour available to the +user. The concept of (old-style) class is unrelated to the concept of type: if +\var{x} is an instance of an old-style class, then \code{x.__class__} +designates the class of \var{x}, but \code{type(x)} is always \code{<type +'instance'>}. This reflects the fact that all old-style instances, +independently of their class, are implemented with a single built-in type, +called \code{instance}. + +New-style classes were introduced in Python 2.2 to unify classes and types. A +new-style class neither more nor less than a user-defined type. If \var{x} is +an instance of a new-style class, then \code{type(x)} is the same as +\code{x.__class__}. + +The major motivation for introducing new-style classes is to provide a unified +object model with a full meta-model. It also has a number of immediate +benefits, like the ability to subclass most built-in types, or the introduction +of "descriptors", which enable computed properties. + +For compatibility reasons, classes are still old-style by default. New-style +classes are created by specifying another new-style class (i.e.\ a type) as a +parent class, or the "top-level type" \class{object} if no other parent is +needed. The behaviour of new-style classes differs from that of old-style +classes in a number of important details in addition to what \function{type} +returns. Some of these changes are fundamental to the new object model, like +the way special methods are invoked. Others are "fixes" that could not be +implemented before for compatibility concerns, like the method resolution order +in case of multiple inheritance. + +This manual is not up-to-date with respect to new-style classes. For now, +please see \url{http://www.python.org/doc/newstyle.html} for more information. + +The plan is to eventually drop old-style classes, leaving only the semantics of +new-style classes. This change will probably only be feasible in Python 3.0. +\index{class}{new-style} +\index{class}{classic} +\index{class}{old-style} + +%========================================================================= +\section{Special method names\label{specialnames}} + +A class can implement certain operations that are invoked by special +syntax (such as arithmetic operations or subscripting and slicing) by +defining methods with special names.\indexii{operator}{overloading} +This is Python's approach to \dfn{operator overloading}, allowing +classes to define their own behavior with respect to language +operators. For instance, if a class defines +a method named \method{__getitem__()}, and \code{x} is an instance of +this class, then \code{x[i]} is equivalent\footnote{This, and other +statements, are only roughly true for instances of new-style +classes.} to +\code{x.__getitem__(i)}. Except where mentioned, attempts to execute +an operation raise an exception when no appropriate method is defined. +\withsubitem{(mapping object method)}{\ttindex{__getitem__()}} + +When implementing a class that emulates any built-in type, it is +important that the emulation only be implemented to the degree that it +makes sense for the object being modelled. For example, some +sequences may work well with retrieval of individual elements, but +extracting a slice may not make sense. (One example of this is the +\class{NodeList} interface in the W3C's Document Object Model.) + + +\subsection{Basic customization\label{customization}} + +\begin{methoddesc}[object]{__new__}{cls\optional{, \moreargs}} +Called to create a new instance of class \var{cls}. \method{__new__()} +is a static method (special-cased so you need not declare it as such) +that takes the class of which an instance was requested as its first +argument. The remaining arguments are those passed to the object +constructor expression (the call to the class). The return value of +\method{__new__()} should be the new object instance (usually an +instance of \var{cls}). + +Typical implementations create a new instance of the class by invoking +the superclass's \method{__new__()} method using +\samp{super(\var{currentclass}, \var{cls}).__new__(\var{cls}[, ...])} +with appropriate arguments and then modifying the newly-created instance +as necessary before returning it. + +If \method{__new__()} returns an instance of \var{cls}, then the new +instance's \method{__init__()} method will be invoked like +\samp{__init__(\var{self}[, ...])}, where \var{self} is the new instance +and the remaining arguments are the same as were passed to +\method{__new__()}. + +If \method{__new__()} does not return an instance of \var{cls}, then the +new instance's \method{__init__()} method will not be invoked. + +\method{__new__()} is intended mainly to allow subclasses of +immutable types (like int, str, or tuple) to customize instance +creation. +\end{methoddesc} + +\begin{methoddesc}[object]{__init__}{self\optional{, \moreargs}} +Called\indexii{class}{constructor} when the instance is created. The +arguments are those passed to the class constructor expression. If a +base class has an \method{__init__()} method, the derived class's +\method{__init__()} method, if any, must explicitly call it to ensure proper +initialization of the base class part of the instance; for example: +\samp{BaseClass.__init__(\var{self}, [\var{args}...])}. As a special +constraint on constructors, no value may be returned; doing so will +cause a \exception{TypeError} to be raised at runtime. +\end{methoddesc} + + +\begin{methoddesc}[object]{__del__}{self} +Called when the instance is about to be destroyed. This is also +called a destructor\index{destructor}. If a base class +has a \method{__del__()} method, the derived class's \method{__del__()} +method, if any, +must explicitly call it to ensure proper deletion of the base class +part of the instance. Note that it is possible (though not recommended!) +for the \method{__del__()} +method to postpone destruction of the instance by creating a new +reference to it. It may then be called at a later time when this new +reference is deleted. It is not guaranteed that +\method{__del__()} methods are called for objects that still exist when +the interpreter exits. +\stindex{del} + +\begin{notice} +\samp{del x} doesn't directly call +\code{x.__del__()} --- the former decrements the reference count for +\code{x} by one, and the latter is only called when \code{x}'s reference +count reaches zero. Some common situations that may prevent the +reference count of an object from going to zero include: circular +references between objects (e.g., a doubly-linked list or a tree data +structure with parent and child pointers); a reference to the object +on the stack frame of a function that caught an exception (the +traceback stored in \code{sys.exc_traceback} keeps the stack frame +alive); or a reference to the object on the stack frame that raised an +unhandled exception in interactive mode (the traceback stored in +\code{sys.last_traceback} keeps the stack frame alive). The first +situation can only be remedied by explicitly breaking the cycles; the +latter two situations can be resolved by storing \code{None} in +\code{sys.exc_traceback} or \code{sys.last_traceback}. Circular +references which are garbage are detected when the option cycle +detector is enabled (it's on by default), but can only be cleaned up +if there are no Python-level \method{__del__()} methods involved. +Refer to the documentation for the \ulink{\module{gc} +module}{../lib/module-gc.html} for more information about how +\method{__del__()} methods are handled by the cycle detector, +particularly the description of the \code{garbage} value. +\end{notice} + +\begin{notice}[warning] +Due to the precarious circumstances under which +\method{__del__()} methods are invoked, exceptions that occur during their +execution are ignored, and a warning is printed to \code{sys.stderr} +instead. Also, when \method{__del__()} is invoked in response to a module +being deleted (e.g., when execution of the program is done), other +globals referenced by the \method{__del__()} method may already have been +deleted. For this reason, \method{__del__()} methods should do the +absolute minimum needed to maintain external invariants. Starting with +version 1.5, Python guarantees that globals whose name begins with a single +underscore are deleted from their module before other globals are deleted; +if no other references to such globals exist, this may help in assuring that +imported modules are still available at the time when the +\method{__del__()} method is called. +\end{notice} +\end{methoddesc} + +\begin{methoddesc}[object]{__repr__}{self} +Called by the \function{repr()}\bifuncindex{repr} built-in function +and by string conversions (reverse quotes) to compute the ``official'' +string representation of an object. If at all possible, this should +look like a valid Python expression that could be used to recreate an +object with the same value (given an appropriate environment). If +this is not possible, a string of the form \samp{<\var{...some useful +description...}>} should be returned. The return value must be a +string object. +If a class defines \method{__repr__()} but not \method{__str__()}, +then \method{__repr__()} is also used when an ``informal'' string +representation of instances of that class is required. + +This is typically used for debugging, so it is important that the +representation is information-rich and unambiguous. +\indexii{string}{conversion} +\indexii{reverse}{quotes} +\indexii{backward}{quotes} +\index{back-quotes} +\end{methoddesc} + +\begin{methoddesc}[object]{__str__}{self} +Called by the \function{str()}\bifuncindex{str} built-in function and +by the \keyword{print}\stindex{print} statement to compute the +``informal'' string representation of an object. This differs from +\method{__repr__()} in that it does not have to be a valid Python +expression: a more convenient or concise representation may be used +instead. The return value must be a string object. +\end{methoddesc} + +\begin{methoddesc}[object]{__lt__}{self, other} +\methodline[object]{__le__}{self, other} +\methodline[object]{__eq__}{self, other} +\methodline[object]{__ne__}{self, other} +\methodline[object]{__gt__}{self, other} +\methodline[object]{__ge__}{self, other} +\versionadded{2.1} +These are the so-called ``rich comparison'' methods, and are called +for comparison operators in preference to \method{__cmp__()} below. +The correspondence between operator symbols and method names is as +follows: +\code{\var{x}<\var{y}} calls \code{\var{x}.__lt__(\var{y})}, +\code{\var{x}<=\var{y}} calls \code{\var{x}.__le__(\var{y})}, +\code{\var{x}==\var{y}} calls \code{\var{x}.__eq__(\var{y})}, +\code{\var{x}!=\var{y}} and \code{\var{x}<>\var{y}} call +\code{\var{x}.__ne__(\var{y})}, +\code{\var{x}>\var{y}} calls \code{\var{x}.__gt__(\var{y})}, and +\code{\var{x}>=\var{y}} calls \code{\var{x}.__ge__(\var{y})}. + +A rich comparison method may return the singleton \code{NotImplemented} if it +does not implement the operation for a given pair of arguments. +By convention, \code{False} and \code{True} are returned for a successful +comparison. However, these methods can return any value, so if the +comparison operator is used in a Boolean context (e.g., in the condition +of an \code{if} statement), Python will call \function{bool()} on the +value to determine if the result is true or false. + +There are no implied relationships among the comparison operators. +The truth of \code{\var{x}==\var{y}} does not imply that \code{\var{x}!=\var{y}} +is false. Accordingly, when defining \method{__eq__()}, one should also +define \method{__ne__()} so that the operators will behave as expected. + +There are no reflected (swapped-argument) versions of these methods +(to be used when the left argument does not support the operation but +the right argument does); rather, \method{__lt__()} and +\method{__gt__()} are each other's reflection, \method{__le__()} and +\method{__ge__()} are each other's reflection, and \method{__eq__()} +and \method{__ne__()} are their own reflection. + +Arguments to rich comparison methods are never coerced. +\end{methoddesc} + +\begin{methoddesc}[object]{__cmp__}{self, other} +Called by comparison operations if rich comparison (see above) is not +defined. Should return a negative integer if \code{self < other}, +zero if \code{self == other}, a positive integer if \code{self > +other}. If no \method{__cmp__()}, \method{__eq__()} or +\method{__ne__()} operation is defined, class instances are compared +by object identity (``address''). See also the description of +\method{__hash__()} for some important notes on creating objects which +support custom comparison operations and are usable as dictionary +keys. +(Note: the restriction that exceptions are not propagated by +\method{__cmp__()} has been removed since Python 1.5.) +\bifuncindex{cmp} +\index{comparisons} +\end{methoddesc} + +\begin{methoddesc}[object]{__rcmp__}{self, other} + \versionchanged[No longer supported]{2.1} +\end{methoddesc} + +\begin{methoddesc}[object]{__hash__}{self} +Called for the key object for dictionary \obindex{dictionary} +operations, and by the built-in function +\function{hash()}\bifuncindex{hash}. Should return a 32-bit integer +usable as a hash value +for dictionary operations. The only required property is that objects +which compare equal have the same hash value; it is advised to somehow +mix together (e.g., using exclusive or) the hash values for the +components of the object that also play a part in comparison of +objects. If a class does not define a \method{__cmp__()} method it should +not define a \method{__hash__()} operation either; if it defines +\method{__cmp__()} or \method{__eq__()} but not \method{__hash__()}, +its instances will not be usable as dictionary keys. If a class +defines mutable objects and implements a \method{__cmp__()} or +\method{__eq__()} method, it should not implement \method{__hash__()}, +since the dictionary implementation requires that a key's hash value +is immutable (if the object's hash value changes, it will be in the +wrong hash bucket). + +\versionchanged[\method{__hash__()} may now also return a long +integer object; the 32-bit integer is then derived from the hash +of that object]{2.5} + +\withsubitem{(object method)}{\ttindex{__cmp__()}} +\end{methoddesc} + +\begin{methoddesc}[object]{__nonzero__}{self} +Called to implement truth value testing, and the built-in operation +\code{bool()}; should return \code{False} or \code{True}, or their +integer equivalents \code{0} or \code{1}. +When this method is not defined, \method{__len__()} is +called, if it is defined (see below). If a class defines neither +\method{__len__()} nor \method{__nonzero__()}, all its instances are +considered true. +\withsubitem{(mapping object method)}{\ttindex{__len__()}} +\end{methoddesc} + +\begin{methoddesc}[object]{__unicode__}{self} +Called to implement \function{unicode()}\bifuncindex{unicode} builtin; +should return a Unicode object. When this method is not defined, string +conversion is attempted, and the result of string conversion is converted +to Unicode using the system default encoding. +\end{methoddesc} + + +\subsection{Customizing attribute access\label{attribute-access}} + +The following methods can be defined to customize the meaning of +attribute access (use of, assignment to, or deletion of \code{x.name}) +for class instances. + +\begin{methoddesc}[object]{__getattr__}{self, name} +Called when an attribute lookup has not found the attribute in the +usual places (i.e. it is not an instance attribute nor is it found in +the class tree for \code{self}). \code{name} is the attribute name. +This method should return the (computed) attribute value or raise an +\exception{AttributeError} exception. + +Note that if the attribute is found through the normal mechanism, +\method{__getattr__()} is not called. (This is an intentional +asymmetry between \method{__getattr__()} and \method{__setattr__()}.) +This is done both for efficiency reasons and because otherwise +\method{__setattr__()} would have no way to access other attributes of +the instance. Note that at least for instance variables, you can fake +total control by not inserting any values in the instance attribute +dictionary (but instead inserting them in another object). See the +\method{__getattribute__()} method below for a way to actually get +total control in new-style classes. +\withsubitem{(object method)}{\ttindex{__setattr__()}} +\end{methoddesc} + +\begin{methoddesc}[object]{__setattr__}{self, name, value} +Called when an attribute assignment is attempted. This is called +instead of the normal mechanism (i.e.\ store the value in the instance +dictionary). \var{name} is the attribute name, \var{value} is the +value to be assigned to it. + +If \method{__setattr__()} wants to assign to an instance attribute, it +should not simply execute \samp{self.\var{name} = value} --- this +would cause a recursive call to itself. Instead, it should insert the +value in the dictionary of instance attributes, e.g., +\samp{self.__dict__[\var{name}] = value}. For new-style classes, +rather than accessing the instance dictionary, it should call the base +class method with the same name, for example, +\samp{object.__setattr__(self, name, value)}. +\withsubitem{(instance attribute)}{\ttindex{__dict__}} +\end{methoddesc} + +\begin{methoddesc}[object]{__delattr__}{self, name} +Like \method{__setattr__()} but for attribute deletion instead of +assignment. This should only be implemented if \samp{del +obj.\var{name}} is meaningful for the object. +\end{methoddesc} + +\subsubsection{More attribute access for new-style classes \label{new-style-attribute-access}} + +The following methods only apply to new-style classes. + +\begin{methoddesc}[object]{__getattribute__}{self, name} +Called unconditionally to implement attribute accesses for instances +of the class. If the class also defines \method{__getattr__()}, the latter +will not be called unless \method{__getattribute__()} either calls it +explicitly or raises an \exception{AttributeError}. +This method should return the (computed) attribute +value or raise an \exception{AttributeError} exception. +In order to avoid infinite recursion in this method, its +implementation should always call the base class method with the same +name to access any attributes it needs, for example, +\samp{object.__getattribute__(self, name)}. +\end{methoddesc} + +\subsubsection{Implementing Descriptors \label{descriptors}} + +The following methods only apply when an instance of the class +containing the method (a so-called \emph{descriptor} class) appears in +the class dictionary of another new-style class, known as the +\emph{owner} class. In the examples below, ``the attribute'' refers to +the attribute whose name is the key of the property in the owner +class' \code{__dict__}. Descriptors can only be implemented as +new-style classes themselves. + +\begin{methoddesc}[object]{__get__}{self, instance, owner} +Called to get the attribute of the owner class (class attribute access) +or of an instance of that class (instance attribute access). +\var{owner} is always the owner class, while \var{instance} is the +instance that the attribute was accessed through, or \code{None} when +the attribute is accessed through the \var{owner}. This method should +return the (computed) attribute value or raise an +\exception{AttributeError} exception. +\end{methoddesc} + +\begin{methoddesc}[object]{__set__}{self, instance, value} +Called to set the attribute on an instance \var{instance} of the owner +class to a new value, \var{value}. +\end{methoddesc} + +\begin{methoddesc}[object]{__delete__}{self, instance} +Called to delete the attribute on an instance \var{instance} of the +owner class. +\end{methoddesc} + + +\subsubsection{Invoking Descriptors \label{descriptor-invocation}} + +In general, a descriptor is an object attribute with ``binding behavior'', +one whose attribute access has been overridden by methods in the descriptor +protocol: \method{__get__()}, \method{__set__()}, and \method{__delete__()}. +If any of those methods are defined for an object, it is said to be a +descriptor. + +The default behavior for attribute access is to get, set, or delete the +attribute from an object's dictionary. For instance, \code{a.x} has a +lookup chain starting with \code{a.__dict__['x']}, then +\code{type(a).__dict__['x']}, and continuing +through the base classes of \code{type(a)} excluding metaclasses. + +However, if the looked-up value is an object defining one of the descriptor +methods, then Python may override the default behavior and invoke the +descriptor method instead. Where this occurs in the precedence chain depends +on which descriptor methods were defined and how they were called. Note that +descriptors are only invoked for new style objects or classes +(ones that subclass \class{object()} or \class{type()}). + +The starting point for descriptor invocation is a binding, \code{a.x}. +How the arguments are assembled depends on \code{a}: + +\begin{itemize} + + \item[Direct Call] The simplest and least common call is when user code + directly invokes a descriptor method: \code{x.__get__(a)}. + + \item[Instance Binding] If binding to a new-style object instance, + \code{a.x} is transformed into the call: + \code{type(a).__dict__['x'].__get__(a, type(a))}. + + \item[Class Binding] If binding to a new-style class, \code{A.x} + is transformed into the call: \code{A.__dict__['x'].__get__(None, A)}. + + \item[Super Binding] If \code{a} is an instance of \class{super}, + then the binding \code{super(B, obj).m()} searches + \code{obj.__class__.__mro__} for the base class \code{A} immediately + preceding \code{B} and then invokes the descriptor with the call: + \code{A.__dict__['m'].__get__(obj, A)}. + +\end{itemize} + +For instance bindings, the precedence of descriptor invocation depends +on the which descriptor methods are defined. Data descriptors define +both \method{__get__()} and \method{__set__()}. Non-data descriptors have +just the \method{__get__()} method. Data descriptors always override +a redefinition in an instance dictionary. In contrast, non-data +descriptors can be overridden by instances. + +Python methods (including \function{staticmethod()} and \function{classmethod()}) +are implemented as non-data descriptors. Accordingly, instances can +redefine and override methods. This allows individual instances to acquire +behaviors that differ from other instances of the same class. + +The \function{property()} function is implemented as a data descriptor. +Accordingly, instances cannot override the behavior of a property. + + +\subsubsection{__slots__\label{slots}} + +By default, instances of both old and new-style classes have a dictionary +for attribute storage. This wastes space for objects having very few instance +variables. The space consumption can become acute when creating large numbers +of instances. + +The default can be overridden by defining \var{__slots__} in a new-style class +definition. The \var{__slots__} declaration takes a sequence of instance +variables and reserves just enough space in each instance to hold a value +for each variable. Space is saved because \var{__dict__} is not created for +each instance. + +\begin{datadesc}{__slots__} +This class variable can be assigned a string, iterable, or sequence of strings +with variable names used by instances. If defined in a new-style class, +\var{__slots__} reserves space for the declared variables +and prevents the automatic creation of \var{__dict__} and \var{__weakref__} +for each instance. +\versionadded{2.2} +\end{datadesc} + +\noindent +Notes on using \var{__slots__} + +\begin{itemize} + +\item Without a \var{__dict__} variable, instances cannot be assigned new +variables not listed in the \var{__slots__} definition. Attempts to assign +to an unlisted variable name raises \exception{AttributeError}. If dynamic +assignment of new variables is desired, then add \code{'__dict__'} to the +sequence of strings in the \var{__slots__} declaration. +\versionchanged[Previously, adding \code{'__dict__'} to the \var{__slots__} +declaration would not enable the assignment of new attributes not +specifically listed in the sequence of instance variable names]{2.3} + +\item Without a \var{__weakref__} variable for each instance, classes +defining \var{__slots__} do not support weak references to its instances. +If weak reference support is needed, then add \code{'__weakref__'} to the +sequence of strings in the \var{__slots__} declaration. +\versionchanged[Previously, adding \code{'__weakref__'} to the \var{__slots__} +declaration would not enable support for weak references]{2.3} + +\item \var{__slots__} are implemented at the class level by creating +descriptors (\ref{descriptors}) for each variable name. As a result, +class attributes cannot be used to set default values for instance +variables defined by \var{__slots__}; otherwise, the class attribute would +overwrite the descriptor assignment. + +\item If a class defines a slot also defined in a base class, the instance +variable defined by the base class slot is inaccessible (except by retrieving +its descriptor directly from the base class). This renders the meaning of the +program undefined. In the future, a check may be added to prevent this. + +\item The action of a \var{__slots__} declaration is limited to the class +where it is defined. As a result, subclasses will have a \var{__dict__} +unless they also define \var{__slots__}. + +\item \var{__slots__} do not work for classes derived from ``variable-length'' +built-in types such as \class{long}, \class{str} and \class{tuple}. + +\item Any non-string iterable may be assigned to \var{__slots__}. +Mappings may also be used; however, in the future, special meaning may +be assigned to the values corresponding to each key. + +\end{itemize} + + +\subsection{Customizing class creation\label{metaclasses}} + +By default, new-style classes are constructed using \function{type()}. +A class definition is read into a separate namespace and the value +of class name is bound to the result of \code{type(name, bases, dict)}. + +When the class definition is read, if \var{__metaclass__} is defined +then the callable assigned to it will be called instead of \function{type()}. +The allows classes or functions to be written which monitor or alter the class +creation process: + +\begin{itemize} +\item Modifying the class dictionary prior to the class being created. +\item Returning an instance of another class -- essentially performing +the role of a factory function. +\end{itemize} + +\begin{datadesc}{__metaclass__} +This variable can be any callable accepting arguments for \code{name}, +\code{bases}, and \code{dict}. Upon class creation, the callable is +used instead of the built-in \function{type()}. +\versionadded{2.2} +\end{datadesc} + +The appropriate metaclass is determined by the following precedence rules: + +\begin{itemize} + +\item If \code{dict['__metaclass__']} exists, it is used. + +\item Otherwise, if there is at least one base class, its metaclass is used +(this looks for a \var{__class__} attribute first and if not found, uses its +type). + +\item Otherwise, if a global variable named __metaclass__ exists, it is used. + +\item Otherwise, the old-style, classic metaclass (types.ClassType) is used. + +\end{itemize} + +The potential uses for metaclasses are boundless. Some ideas that have +been explored including logging, interface checking, automatic delegation, +automatic property creation, proxies, frameworks, and automatic resource +locking/synchronization. + + +\subsection{Emulating callable objects\label{callable-types}} + +\begin{methoddesc}[object]{__call__}{self\optional{, args...}} +Called when the instance is ``called'' as a function; if this method +is defined, \code{\var{x}(arg1, arg2, ...)} is a shorthand for +\code{\var{x}.__call__(arg1, arg2, ...)}. +\indexii{call}{instance} +\end{methoddesc} + + +\subsection{Emulating container types\label{sequence-types}} + +The following methods can be defined to implement container +objects. Containers usually are sequences (such as lists or tuples) +or mappings (like dictionaries), but can represent other containers as +well. The first set of methods is used either to emulate a +sequence or to emulate a mapping; the difference is that for a +sequence, the allowable keys should be the integers \var{k} for which +\code{0 <= \var{k} < \var{N}} where \var{N} is the length of the +sequence, or slice objects, which define a range of items. (For backwards +compatibility, the method \method{__getslice__()} (see below) can also be +defined to handle simple, but not extended slices.) It is also recommended +that mappings provide the methods \method{keys()}, \method{values()}, +\method{items()}, \method{has_key()}, \method{get()}, \method{clear()}, +\method{setdefault()}, \method{iterkeys()}, \method{itervalues()}, +\method{iteritems()}, \method{pop()}, \method{popitem()}, +\method{copy()}, and \method{update()} behaving similar to those for +Python's standard dictionary objects. The \module{UserDict} module +provides a \class{DictMixin} class to help create those methods +from a base set of \method{__getitem__()}, \method{__setitem__()}, +\method{__delitem__()}, and \method{keys()}. +Mutable sequences should provide +methods \method{append()}, \method{count()}, \method{index()}, +\method{extend()}, +\method{insert()}, \method{pop()}, \method{remove()}, \method{reverse()} +and \method{sort()}, like Python standard list objects. Finally, +sequence types should implement addition (meaning concatenation) and +multiplication (meaning repetition) by defining the methods +\method{__add__()}, \method{__radd__()}, \method{__iadd__()}, +\method{__mul__()}, \method{__rmul__()} and \method{__imul__()} described +below; they should not define \method{__coerce__()} or other numerical +operators. It is recommended that both mappings and sequences +implement the \method{__contains__()} method to allow efficient use of +the \code{in} operator; for mappings, \code{in} should be equivalent +of \method{has_key()}; for sequences, it should search through the +values. It is further recommended that both mappings and sequences +implement the \method{__iter__()} method to allow efficient iteration +through the container; for mappings, \method{__iter__()} should be +the same as \method{iterkeys()}; for sequences, it should iterate +through the values. +\withsubitem{(mapping object method)}{ + \ttindex{keys()} + \ttindex{values()} + \ttindex{items()} + \ttindex{iterkeys()} + \ttindex{itervalues()} + \ttindex{iteritems()} + \ttindex{has_key()} + \ttindex{get()} + \ttindex{setdefault()} + \ttindex{pop()} + \ttindex{popitem()} + \ttindex{clear()} + \ttindex{copy()} + \ttindex{update()} + \ttindex{__contains__()}} +\withsubitem{(sequence object method)}{ + \ttindex{append()} + \ttindex{count()} + \ttindex{extend()} + \ttindex{index()} + \ttindex{insert()} + \ttindex{pop()} + \ttindex{remove()} + \ttindex{reverse()} + \ttindex{sort()} + \ttindex{__add__()} + \ttindex{__radd__()} + \ttindex{__iadd__()} + \ttindex{__mul__()} + \ttindex{__rmul__()} + \ttindex{__imul__()} + \ttindex{__contains__()} + \ttindex{__iter__()}} +\withsubitem{(numeric object method)}{\ttindex{__coerce__()}} + +\begin{methoddesc}[container object]{__len__}{self} +Called to implement the built-in function +\function{len()}\bifuncindex{len}. Should return the length of the +object, an integer \code{>=} 0. Also, an object that doesn't define a +\method{__nonzero__()} method and whose \method{__len__()} method +returns zero is considered to be false in a Boolean context. +\withsubitem{(object method)}{\ttindex{__nonzero__()}} +\end{methoddesc} + +\begin{methoddesc}[container object]{__getitem__}{self, key} +Called to implement evaluation of \code{\var{self}[\var{key}]}. +For sequence types, the accepted keys should be integers and slice +objects.\obindex{slice} Note that +the special interpretation of negative indexes (if the class wishes to +emulate a sequence type) is up to the \method{__getitem__()} method. +If \var{key} is of an inappropriate type, \exception{TypeError} may be +raised; if of a value outside the set of indexes for the sequence +(after any special interpretation of negative values), +\exception{IndexError} should be raised. +For mapping types, if \var{key} is missing (not in the container), +\exception{KeyError} should be raised. +\note{\keyword{for} loops expect that an +\exception{IndexError} will be raised for illegal indexes to allow +proper detection of the end of the sequence.} +\end{methoddesc} + +\begin{methoddesc}[container object]{__setitem__}{self, key, value} +Called to implement assignment to \code{\var{self}[\var{key}]}. Same +note as for \method{__getitem__()}. This should only be implemented +for mappings if the objects support changes to the values for keys, or +if new keys can be added, or for sequences if elements can be +replaced. The same exceptions should be raised for improper +\var{key} values as for the \method{__getitem__()} method. +\end{methoddesc} + +\begin{methoddesc}[container object]{__delitem__}{self, key} +Called to implement deletion of \code{\var{self}[\var{key}]}. Same +note as for \method{__getitem__()}. This should only be implemented +for mappings if the objects support removal of keys, or for sequences +if elements can be removed from the sequence. The same exceptions +should be raised for improper \var{key} values as for the +\method{__getitem__()} method. +\end{methoddesc} + +\begin{methoddesc}[container object]{__iter__}{self} +This method is called when an iterator is required for a container. +This method should return a new iterator object that can iterate over +all the objects in the container. For mappings, it should iterate +over the keys of the container, and should also be made available as +the method \method{iterkeys()}. + +Iterator objects also need to implement this method; they are required +to return themselves. For more information on iterator objects, see +``\ulink{Iterator Types}{../lib/typeiter.html}'' in the +\citetitle[../lib/lib.html]{Python Library Reference}. +\end{methoddesc} + +The membership test operators (\keyword{in} and \keyword{not in}) are +normally implemented as an iteration through a sequence. However, +container objects can supply the following special method with a more +efficient implementation, which also does not require the object be a +sequence. + +\begin{methoddesc}[container object]{__contains__}{self, item} +Called to implement membership test operators. Should return true if +\var{item} is in \var{self}, false otherwise. For mapping objects, +this should consider the keys of the mapping rather than the values or +the key-item pairs. +\end{methoddesc} + + +\subsection{Additional methods for emulation of sequence types + \label{sequence-methods}} + +The following optional methods can be defined to further emulate sequence +objects. Immutable sequences methods should at most only define +\method{__getslice__()}; mutable sequences might define all three +methods. + +\begin{methoddesc}[sequence object]{__getslice__}{self, i, j} +\deprecated{2.0}{Support slice objects as parameters to the +\method{__getitem__()} method.} +Called to implement evaluation of \code{\var{self}[\var{i}:\var{j}]}. +The returned object should be of the same type as \var{self}. Note +that missing \var{i} or \var{j} in the slice expression are replaced +by zero or \code{sys.maxint}, respectively. If negative indexes are +used in the slice, the length of the sequence is added to that index. +If the instance does not implement the \method{__len__()} method, an +\exception{AttributeError} is raised. +No guarantee is made that indexes adjusted this way are not still +negative. Indexes which are greater than the length of the sequence +are not modified. +If no \method{__getslice__()} is found, a slice +object is created instead, and passed to \method{__getitem__()} instead. +\end{methoddesc} + +\begin{methoddesc}[sequence object]{__setslice__}{self, i, j, sequence} +Called to implement assignment to \code{\var{self}[\var{i}:\var{j}]}. +Same notes for \var{i} and \var{j} as for \method{__getslice__()}. + +This method is deprecated. If no \method{__setslice__()} is found, +or for extended slicing of the form +\code{\var{self}[\var{i}:\var{j}:\var{k}]}, a +slice object is created, and passed to \method{__setitem__()}, +instead of \method{__setslice__()} being called. +\end{methoddesc} + +\begin{methoddesc}[sequence object]{__delslice__}{self, i, j} +Called to implement deletion of \code{\var{self}[\var{i}:\var{j}]}. +Same notes for \var{i} and \var{j} as for \method{__getslice__()}. +This method is deprecated. If no \method{__delslice__()} is found, +or for extended slicing of the form +\code{\var{self}[\var{i}:\var{j}:\var{k}]}, a +slice object is created, and passed to \method{__delitem__()}, +instead of \method{__delslice__()} being called. +\end{methoddesc} + +Notice that these methods are only invoked when a single slice with a +single colon is used, and the slice method is available. For slice +operations involving extended slice notation, or in absence of the +slice methods, \method{__getitem__()}, \method{__setitem__()} or +\method{__delitem__()} is called with a slice object as argument. + +The following example demonstrate how to make your program or module +compatible with earlier versions of Python (assuming that methods +\method{__getitem__()}, \method{__setitem__()} and \method{__delitem__()} +support slice objects as arguments): + +\begin{verbatim} +class MyClass: + ... + def __getitem__(self, index): + ... + def __setitem__(self, index, value): + ... + def __delitem__(self, index): + ... + + if sys.version_info < (2, 0): + # They won't be defined if version is at least 2.0 final + + def __getslice__(self, i, j): + return self[max(0, i):max(0, j):] + def __setslice__(self, i, j, seq): + self[max(0, i):max(0, j):] = seq + def __delslice__(self, i, j): + del self[max(0, i):max(0, j):] + ... +\end{verbatim} + +Note the calls to \function{max()}; these are necessary because of +the handling of negative indices before the +\method{__*slice__()} methods are called. When negative indexes are +used, the \method{__*item__()} methods receive them as provided, but +the \method{__*slice__()} methods get a ``cooked'' form of the index +values. For each negative index value, the length of the sequence is +added to the index before calling the method (which may still result +in a negative index); this is the customary handling of negative +indexes by the built-in sequence types, and the \method{__*item__()} +methods are expected to do this as well. However, since they should +already be doing that, negative indexes cannot be passed in; they must +be constrained to the bounds of the sequence before being passed to +the \method{__*item__()} methods. +Calling \code{max(0, i)} conveniently returns the proper value. + + +\subsection{Emulating numeric types\label{numeric-types}} + +The following methods can be defined to emulate numeric objects. +Methods corresponding to operations that are not supported by the +particular kind of number implemented (e.g., bitwise operations for +non-integral numbers) should be left undefined. + +\begin{methoddesc}[numeric object]{__add__}{self, other} +\methodline[numeric object]{__sub__}{self, other} +\methodline[numeric object]{__mul__}{self, other} +\methodline[numeric object]{__floordiv__}{self, other} +\methodline[numeric object]{__mod__}{self, other} +\methodline[numeric object]{__divmod__}{self, other} +\methodline[numeric object]{__pow__}{self, other\optional{, modulo}} +\methodline[numeric object]{__lshift__}{self, other} +\methodline[numeric object]{__rshift__}{self, other} +\methodline[numeric object]{__and__}{self, other} +\methodline[numeric object]{__xor__}{self, other} +\methodline[numeric object]{__or__}{self, other} +These methods are +called to implement the binary arithmetic operations (\code{+}, +\code{-}, \code{*}, \code{//}, \code{\%}, +\function{divmod()}\bifuncindex{divmod}, +\function{pow()}\bifuncindex{pow}, \code{**}, \code{<<}, +\code{>>}, \code{\&}, \code{\^}, \code{|}). For instance, to +evaluate the expression \var{x}\code{+}\var{y}, where \var{x} is an +instance of a class that has an \method{__add__()} method, +\code{\var{x}.__add__(\var{y})} is called. The \method{__divmod__()} +method should be the equivalent to using \method{__floordiv__()} and +\method{__mod__()}; it should not be related to \method{__truediv__()} +(described below). Note that +\method{__pow__()} should be defined to accept an optional third +argument if the ternary version of the built-in +\function{pow()}\bifuncindex{pow} function is to be supported. + +If one of those methods does not support the operation with the +supplied arguments, it should return \code{NotImplemented}. +\end{methoddesc} + +\begin{methoddesc}[numeric object]{__div__}{self, other} +\methodline[numeric object]{__truediv__}{self, other} +The division operator (\code{/}) is implemented by these methods. The +\method{__truediv__()} method is used when \code{__future__.division} +is in effect, otherwise \method{__div__()} is used. If only one of +these two methods is defined, the object will not support division in +the alternate context; \exception{TypeError} will be raised instead. +\end{methoddesc} + +\begin{methoddesc}[numeric object]{__radd__}{self, other} +\methodline[numeric object]{__rsub__}{self, other} +\methodline[numeric object]{__rmul__}{self, other} +\methodline[numeric object]{__rdiv__}{self, other} +\methodline[numeric object]{__rtruediv__}{self, other} +\methodline[numeric object]{__rfloordiv__}{self, other} +\methodline[numeric object]{__rmod__}{self, other} +\methodline[numeric object]{__rdivmod__}{self, other} +\methodline[numeric object]{__rpow__}{self, other} +\methodline[numeric object]{__rlshift__}{self, other} +\methodline[numeric object]{__rrshift__}{self, other} +\methodline[numeric object]{__rand__}{self, other} +\methodline[numeric object]{__rxor__}{self, other} +\methodline[numeric object]{__ror__}{self, other} +These methods are +called to implement the binary arithmetic operations (\code{+}, +\code{-}, \code{*}, \code{/}, \code{\%}, +\function{divmod()}\bifuncindex{divmod}, +\function{pow()}\bifuncindex{pow}, \code{**}, \code{<<}, +\code{>>}, \code{\&}, \code{\^}, \code{|}) with reflected +(swapped) operands. These functions are only called if the left +operand does not support the corresponding operation and the +operands are of different types.\footnote{ + For operands of the same type, it is assumed that if the + non-reflected method (such as \method{__add__()}) fails the + operation is not supported, which is why the reflected method + is not called.} +For instance, to evaluate the expression \var{x}\code{-}\var{y}, +where \var{y} is an instance of a class that has an +\method{__rsub__()} method, \code{\var{y}.__rsub__(\var{x})} +is called if \code{\var{x}.__sub__(\var{y})} returns +\var{NotImplemented}. + +Note that ternary +\function{pow()}\bifuncindex{pow} will not try calling +\method{__rpow__()} (the coercion rules would become too +complicated). + +\note{If the right operand's type is a subclass of the left operand's + type and that subclass provides the reflected method for the + operation, this method will be called before the left operand's + non-reflected method. This behavior allows subclasses to + override their ancestors' operations.} +\end{methoddesc} + +\begin{methoddesc}[numeric object]{__iadd__}{self, other} +\methodline[numeric object]{__isub__}{self, other} +\methodline[numeric object]{__imul__}{self, other} +\methodline[numeric object]{__idiv__}{self, other} +\methodline[numeric object]{__itruediv__}{self, other} +\methodline[numeric object]{__ifloordiv__}{self, other} +\methodline[numeric object]{__imod__}{self, other} +\methodline[numeric object]{__ipow__}{self, other\optional{, modulo}} +\methodline[numeric object]{__ilshift__}{self, other} +\methodline[numeric object]{__irshift__}{self, other} +\methodline[numeric object]{__iand__}{self, other} +\methodline[numeric object]{__ixor__}{self, other} +\methodline[numeric object]{__ior__}{self, other} +These methods are called to implement the augmented arithmetic +operations (\code{+=}, \code{-=}, \code{*=}, \code{/=}, \code{//=}, +\code{\%=}, \code{**=}, \code{<<=}, \code{>>=}, \code{\&=}, +\code{\textasciicircum=}, \code{|=}). These methods should attempt to do the +operation in-place (modifying \var{self}) and return the result (which +could be, but does not have to be, \var{self}). If a specific method +is not defined, the augmented operation falls back to the normal +methods. For instance, to evaluate the expression +\var{x}\code{+=}\var{y}, where \var{x} is an instance of a class that +has an \method{__iadd__()} method, \code{\var{x}.__iadd__(\var{y})} is +called. If \var{x} is an instance of a class that does not define a +\method{__iadd__()} method, \code{\var{x}.__add__(\var{y})} and +\code{\var{y}.__radd__(\var{x})} are considered, as with the +evaluation of \var{x}\code{+}\var{y}. +\end{methoddesc} + +\begin{methoddesc}[numeric object]{__neg__}{self} +\methodline[numeric object]{__pos__}{self} +\methodline[numeric object]{__abs__}{self} +\methodline[numeric object]{__invert__}{self} +Called to implement the unary arithmetic operations (\code{-}, +\code{+}, \function{abs()}\bifuncindex{abs} and \code{\~{}}). +\end{methoddesc} + +\begin{methoddesc}[numeric object]{__complex__}{self} +\methodline[numeric object]{__int__}{self} +\methodline[numeric object]{__long__}{self} +\methodline[numeric object]{__float__}{self} +Called to implement the built-in functions +\function{complex()}\bifuncindex{complex}, +\function{int()}\bifuncindex{int}, \function{long()}\bifuncindex{long}, +and \function{float()}\bifuncindex{float}. Should return a value of +the appropriate type. +\end{methoddesc} + +\begin{methoddesc}[numeric object]{__oct__}{self} +\methodline[numeric object]{__hex__}{self} +Called to implement the built-in functions +\function{oct()}\bifuncindex{oct} and +\function{hex()}\bifuncindex{hex}. Should return a string value. +\end{methoddesc} + +\begin{methoddesc}[numeric object]{__index__}{self} +Called to implement \function{operator.index()}. Also called whenever +Python needs an integer object (such as in slicing). Must return an +integer (int or long). +\versionadded{2.5} +\end{methoddesc} + +\begin{methoddesc}[numeric object]{__coerce__}{self, other} +Called to implement ``mixed-mode'' numeric arithmetic. Should either +return a 2-tuple containing \var{self} and \var{other} converted to +a common numeric type, or \code{None} if conversion is impossible. When +the common type would be the type of \code{other}, it is sufficient to +return \code{None}, since the interpreter will also ask the other +object to attempt a coercion (but sometimes, if the implementation of +the other type cannot be changed, it is useful to do the conversion to +the other type here). A return value of \code{NotImplemented} is +equivalent to returning \code{None}. +\end{methoddesc} + +\subsection{Coercion rules\label{coercion-rules}} + +This section used to document the rules for coercion. As the language +has evolved, the coercion rules have become hard to document +precisely; documenting what one version of one particular +implementation does is undesirable. Instead, here are some informal +guidelines regarding coercion. In Python 3.0, coercion will not be +supported. + +\begin{itemize} + +\item + +If the left operand of a \% operator is a string or Unicode object, no +coercion takes place and the string formatting operation is invoked +instead. + +\item + +It is no longer recommended to define a coercion operation. +Mixed-mode operations on types that don't define coercion pass the +original arguments to the operation. + +\item + +New-style classes (those derived from \class{object}) never invoke the +\method{__coerce__()} method in response to a binary operator; the only +time \method{__coerce__()} is invoked is when the built-in function +\function{coerce()} is called. + +\item + +For most intents and purposes, an operator that returns +\code{NotImplemented} is treated the same as one that is not +implemented at all. + +\item + +Below, \method{__op__()} and \method{__rop__()} are used to signify +the generic method names corresponding to an operator; +\method{__iop__()} is used for the corresponding in-place operator. For +example, for the operator `\code{+}', \method{__add__()} and +\method{__radd__()} are used for the left and right variant of the +binary operator, and \method{__iadd__()} for the in-place variant. + +\item + +For objects \var{x} and \var{y}, first \code{\var{x}.__op__(\var{y})} +is tried. If this is not implemented or returns \code{NotImplemented}, +\code{\var{y}.__rop__(\var{x})} is tried. If this is also not +implemented or returns \code{NotImplemented}, a \exception{TypeError} +exception is raised. But see the following exception: + +\item + +Exception to the previous item: if the left operand is an instance of +a built-in type or a new-style class, and the right operand is an instance +of a proper subclass of that type or class and overrides the base's +\method{__rop__()} method, the right operand's \method{__rop__()} method +is tried \emph{before} the left operand's \method{__op__()} method. + +This is done so that a subclass can completely override binary operators. +Otherwise, the left operand's \method{__op__()} method would always +accept the right operand: when an instance of a given class is expected, +an instance of a subclass of that class is always acceptable. + +\item + +When either operand type defines a coercion, this coercion is called +before that type's \method{__op__()} or \method{__rop__()} method is +called, but no sooner. If the coercion returns an object of a +different type for the operand whose coercion is invoked, part of the +process is redone using the new object. + +\item + +When an in-place operator (like `\code{+=}') is used, if the left +operand implements \method{__iop__()}, it is invoked without any +coercion. When the operation falls back to \method{__op__()} and/or +\method{__rop__()}, the normal coercion rules apply. + +\item + +In \var{x}\code{+}\var{y}, if \var{x} is a sequence that implements +sequence concatenation, sequence concatenation is invoked. + +\item + +In \var{x}\code{*}\var{y}, if one operator is a sequence that +implements sequence repetition, and the other is an integer +(\class{int} or \class{long}), sequence repetition is invoked. + +\item + +Rich comparisons (implemented by methods \method{__eq__()} and so on) +never use coercion. Three-way comparison (implemented by +\method{__cmp__()}) does use coercion under the same conditions as +other binary operations use it. + +\item + +In the current implementation, the built-in numeric types \class{int}, +\class{long} and \class{float} do not use coercion; the type +\class{complex} however does use it. The difference can become +apparent when subclassing these types. Over time, the type +\class{complex} may be fixed to avoid coercion. All these types +implement a \method{__coerce__()} method, for use by the built-in +\function{coerce()} function. + +\end{itemize} + +\subsection{With Statement Context Managers\label{context-managers}} + +\versionadded{2.5} + +A \dfn{context manager} is an object that defines the runtime +context to be established when executing a \keyword{with} +statement. The context manager handles the entry into, +and the exit from, the desired runtime context for the execution +of the block of code. Context managers are normally invoked using +the \keyword{with} statement (described in section~\ref{with}), but +can also be used by directly invoking their methods. + +\stindex{with} +\index{context manager} + +Typical uses of context managers include saving and +restoring various kinds of global state, locking and unlocking +resources, closing opened files, etc. + +For more information on context managers, see +``\ulink{Context Types}{../lib/typecontextmanager.html}'' in the +\citetitle[../lib/lib.html]{Python Library Reference}. + +\begin{methoddesc}[context manager]{__enter__}{self} +Enter the runtime context related to this object. The \keyword{with} +statement will bind this method's return value to the target(s) +specified in the \keyword{as} clause of the statement, if any. +\end{methoddesc} + +\begin{methoddesc}[context manager]{__exit__} +{self, exc_type, exc_value, traceback} +Exit the runtime context related to this object. The parameters +describe the exception that caused the context to be exited. If +the context was exited without an exception, all three arguments +will be \constant{None}. + +If an exception is supplied, and the method wishes to suppress the +exception (i.e., prevent it from being propagated), it should return a +true value. Otherwise, the exception will be processed normally upon +exit from this method. + +Note that \method{__exit__} methods should not reraise the passed-in +exception; this is the caller's responsibility. +\end{methoddesc} + +\begin{seealso} + \seepep{0343}{The "with" statement} + {The specification, background, and examples for the + Python \keyword{with} statement.} +\end{seealso} + diff --git a/sys/src/cmd/python/Doc/ref/ref4.tex b/sys/src/cmd/python/Doc/ref/ref4.tex new file mode 100644 index 000000000..12a2b92e1 --- /dev/null +++ b/sys/src/cmd/python/Doc/ref/ref4.tex @@ -0,0 +1,219 @@ +\chapter{Execution model \label{execmodel}} +\index{execution model} + + +\section{Naming and binding \label{naming}} +\indexii{code}{block} +\index{namespace} +\index{scope} + +\dfn{Names}\index{name} refer to objects. Names are introduced by +name binding operations. Each occurrence of a name in the program +text refers to the \dfn{binding}\indexii{binding}{name} of that name +established in the innermost function block containing the use. + +A \dfn{block}\index{block} is a piece of Python program text that is +executed as a unit. The following are blocks: a module, a function +body, and a class definition. Each command typed interactively is a +block. A script file (a file given as standard input to the +interpreter or specified on the interpreter command line the first +argument) is a code block. A script command (a command specified on +the interpreter command line with the `\strong{-c}' option) is a code +block. The file read by the built-in function \function{execfile()} +is a code block. The string argument passed to the built-in function +\function{eval()} and to the \keyword{exec} statement is a code block. +The expression read and evaluated by the built-in function +\function{input()} is a code block. + +A code block is executed in an \dfn{execution +frame}\indexii{execution}{frame}. A frame contains some +administrative information (used for debugging) and determines where +and how execution continues after the code block's execution has +completed. + +A \dfn{scope}\index{scope} defines the visibility of a name within a +block. If a local variable is defined in a block, its scope includes +that block. If the definition occurs in a function block, the scope +extends to any blocks contained within the defining one, unless a +contained block introduces a different binding for the name. The +scope of names defined in a class block is limited to the class block; +it does not extend to the code blocks of methods. + +When a name is used in a code block, it is resolved using the nearest +enclosing scope. The set of all such scopes visible to a code block +is called the block's \dfn{environment}\index{environment}. + +If a name is bound in a block, it is a local variable of that block. +If a name is bound at the module level, it is a global variable. (The +variables of the module code block are local and global.) If a +variable is used in a code block but not defined there, it is a +\dfn{free variable}\indexii{free}{variable}. + +When a name is not found at all, a +\exception{NameError}\withsubitem{(built-in +exception)}{\ttindex{NameError}} exception is raised. If the name +refers to a local variable that has not been bound, a +\exception{UnboundLocalError}\ttindex{UnboundLocalError} exception is +raised. \exception{UnboundLocalError} is a subclass of +\exception{NameError}. + +The following constructs bind names: formal parameters to functions, +\keyword{import} statements, class and function definitions (these +bind the class or function name in the defining block), and targets +that are identifiers if occurring in an assignment, \keyword{for} loop +header, or in the second position of an \keyword{except} clause +header. The \keyword{import} statement of the form ``\samp{from +\ldots import *}''\stindex{from} binds all names defined in the +imported module, except those beginning with an underscore. This form +may only be used at the module level. + +A target occurring in a \keyword{del} statement is also considered bound +for this purpose (though the actual semantics are to unbind the +name). It is illegal to unbind a name that is referenced by an +enclosing scope; the compiler will report a \exception{SyntaxError}. + +Each assignment or import statement occurs within a block defined by a +class or function definition or at the module level (the top-level +code block). + +If a name binding operation occurs anywhere within a code block, all +uses of the name within the block are treated as references to the +current block. This can lead to errors when a name is used within a +block before it is bound. +This rule is subtle. Python lacks declarations and allows +name binding operations to occur anywhere within a code block. The +local variables of a code block can be determined by scanning the +entire text of the block for name binding operations. + +If the global statement occurs within a block, all uses of the name +specified in the statement refer to the binding of that name in the +top-level namespace. Names are resolved in the top-level namespace by +searching the global namespace, i.e. the namespace of the module +containing the code block, and the builtin namespace, the namespace of +the module \module{__builtin__}. The global namespace is searched +first. If the name is not found there, the builtin namespace is +searched. The global statement must precede all uses of the name. + +The built-in namespace associated with the execution of a code block +is actually found by looking up the name \code{__builtins__} in its +global namespace; this should be a dictionary or a module (in the +latter case the module's dictionary is used). By default, when in the +\module{__main__} module, \code{__builtins__} is the built-in module +\module{__builtin__} (note: no `s'); when in any other module, +\code{__builtins__} is an alias for the dictionary of the +\module{__builtin__} module itself. \code{__builtins__} can be set +to a user-created dictionary to create a weak form of restricted +execution\indexii{restricted}{execution}. + +\begin{notice} + Users should not touch \code{__builtins__}; it is strictly an + implementation detail. Users wanting to override values in the + built-in namespace should \keyword{import} the \module{__builtin__} + (no `s') module and modify its attributes appropriately. +\end{notice} + +The namespace for a module is automatically created the first time a +module is imported. The main module for a script is always called +\module{__main__}\refbimodindex{__main__}. + +The global statement has the same scope as a name binding operation +in the same block. If the nearest enclosing scope for a free variable +contains a global statement, the free variable is treated as a global. + +A class definition is an executable statement that may use and define +names. These references follow the normal rules for name resolution. +The namespace of the class definition becomes the attribute dictionary +of the class. Names defined at the class scope are not visible in +methods. + +\subsection{Interaction with dynamic features \label{dynamic-features}} + +There are several cases where Python statements are illegal when +used in conjunction with nested scopes that contain free +variables. + +If a variable is referenced in an enclosing scope, it is illegal +to delete the name. An error will be reported at compile time. + +If the wild card form of import --- \samp{import *} --- is used in a +function and the function contains or is a nested block with free +variables, the compiler will raise a \exception{SyntaxError}. + +If \keyword{exec} is used in a function and the function contains or +is a nested block with free variables, the compiler will raise a +\exception{SyntaxError} unless the exec explicitly specifies the local +namespace for the \keyword{exec}. (In other words, \samp{exec obj} +would be illegal, but \samp{exec obj in ns} would be legal.) + +The \function{eval()}, \function{execfile()}, and \function{input()} +functions and the \keyword{exec} statement do not have access to the +full environment for resolving names. Names may be resolved in the +local and global namespaces of the caller. Free variables are not +resolved in the nearest enclosing namespace, but in the global +namespace.\footnote{This limitation occurs because the code that is + executed by these operations is not available at the time the + module is compiled.} +The \keyword{exec} statement and the \function{eval()} and +\function{execfile()} functions have optional arguments to override +the global and local namespace. If only one namespace is specified, +it is used for both. + +\section{Exceptions \label{exceptions}} +\index{exception} + +Exceptions are a means of breaking out of the normal flow of control +of a code block in order to handle errors or other exceptional +conditions. An exception is +\emph{raised}\index{raise an exception} at the point where the error +is detected; it may be \emph{handled}\index{handle an exception} by +the surrounding code block or by any code block that directly or +indirectly invoked the code block where the error occurred. +\index{exception handler} +\index{errors} +\index{error handling} + +The Python interpreter raises an exception when it detects a run-time +error (such as division by zero). A Python program can also +explicitly raise an exception with the \keyword{raise} statement. +Exception handlers are specified with the \keyword{try} ... \keyword{except} +statement. The \keyword{try} ... \keyword{finally} statement +specifies cleanup code which does not handle the exception, but is +executed whether an exception occurred or not in the preceding code. + +Python uses the ``termination''\index{termination model} model of +error handling: an exception handler can find out what happened and +continue execution at an outer level, but it cannot repair the cause +of the error and retry the failing operation (except by re-entering +the offending piece of code from the top). + +When an exception is not handled at all, the interpreter terminates +execution of the program, or returns to its interactive main loop. In +either case, it prints a stack backtrace, except when the exception is +\exception{SystemExit}\withsubitem{(built-in +exception)}{\ttindex{SystemExit}}. + +Exceptions are identified by class instances. The \keyword{except} +clause is selected depending on the class of the instance: it must +reference the class of the instance or a base class thereof. The +instance can be received by the handler and can carry additional +information about the exceptional condition. + +Exceptions can also be identified by strings, in which case the +\keyword{except} clause is selected by object identity. An arbitrary +value can be raised along with the identifying string which can be +passed to the handler. + +\deprecated{2.5}{String exceptions should not be used in new code. +They will not be supported in a future version of Python. Old code +should be rewritten to use class exceptions instead.} + +\begin{notice}[warning] +Messages to exceptions are not part of the Python API. Their contents may +change from one version of Python to the next without warning and should not +be relied on by code which will run under multiple versions of the +interpreter. +\end{notice} + +See also the description of the \keyword{try} statement in +section~\ref{try} and \keyword{raise} statement in +section~\ref{raise}. diff --git a/sys/src/cmd/python/Doc/ref/ref5.tex b/sys/src/cmd/python/Doc/ref/ref5.tex new file mode 100644 index 000000000..17c57d43f --- /dev/null +++ b/sys/src/cmd/python/Doc/ref/ref5.tex @@ -0,0 +1,1325 @@ +\chapter{Expressions\label{expressions}} +\index{expression} + +This chapter explains the meaning of the elements of expressions in +Python. + +\strong{Syntax Notes:} In this and the following chapters, extended +BNF\index{BNF} notation will be used to describe syntax, not lexical +analysis. When (one alternative of) a syntax rule has the form + +\begin{productionlist}[*] + \production{name}{\token{othername}} +\end{productionlist} + +and no semantics are given, the semantics of this form of \code{name} +are the same as for \code{othername}. +\index{syntax} + + +\section{Arithmetic conversions\label{conversions}} +\indexii{arithmetic}{conversion} + +When a description of an arithmetic operator below uses the phrase +``the numeric arguments are converted to a common type,'' the +arguments are coerced using the coercion rules listed at +~\ref{coercion-rules}. If both arguments are standard numeric types, +the following coercions are applied: + +\begin{itemize} +\item If either argument is a complex number, the other is converted + to complex; +\item otherwise, if either argument is a floating point number, + the other is converted to floating point; +\item otherwise, if either argument is a long integer, + the other is converted to long integer; +\item otherwise, both must be plain integers and no conversion + is necessary. +\end{itemize} + +Some additional rules apply for certain operators (e.g., a string left +argument to the `\%' operator). Extensions can define their own +coercions. + + +\section{Atoms\label{atoms}} +\index{atom} + +Atoms are the most basic elements of expressions. The simplest atoms +are identifiers or literals. Forms enclosed in +reverse quotes or in parentheses, brackets or braces are also +categorized syntactically as atoms. The syntax for atoms is: + +\begin{productionlist} + \production{atom} + {\token{identifier} | \token{literal} | \token{enclosure}} + \production{enclosure} + {\token{parenth_form} | \token{list_display}} + \productioncont{| \token{generator_expression} | \token{dict_display}} + \productioncont{| \token{string_conversion} | \token{yield_atom}} +\end{productionlist} + + +\subsection{Identifiers (Names)\label{atom-identifiers}} +\index{name} +\index{identifier} + +An identifier occurring as an atom is a name. See +section \ref{identifiers} for lexical definition and +section~\ref{naming} for documentation of naming and binding. + +When the name is bound to an object, evaluation of the atom yields +that object. When a name is not bound, an attempt to evaluate it +raises a \exception{NameError} exception. +\exindex{NameError} + +\strong{Private name mangling:} +\indexii{name}{mangling}% +\indexii{private}{names}% +When an identifier that textually occurs in a class definition begins +with two or more underscore characters and does not end in two or more +underscores, it is considered a \dfn{private name} of that class. +Private names are transformed to a longer form before code is +generated for them. The transformation inserts the class name in +front of the name, with leading underscores removed, and a single +underscore inserted in front of the class name. For example, the +identifier \code{__spam} occurring in a class named \code{Ham} will be +transformed to \code{_Ham__spam}. This transformation is independent +of the syntactical context in which the identifier is used. If the +transformed name is extremely long (longer than 255 characters), +implementation defined truncation may happen. If the class name +consists only of underscores, no transformation is done. + + +\subsection{Literals\label{atom-literals}} +\index{literal} + +Python supports string literals and various numeric literals: + +\begin{productionlist} + \production{literal} + {\token{stringliteral} | \token{integer} | \token{longinteger}} + \productioncont{| \token{floatnumber} | \token{imagnumber}} +\end{productionlist} + +Evaluation of a literal yields an object of the given type (string, +integer, long integer, floating point number, complex number) with the +given value. The value may be approximated in the case of floating +point and imaginary (complex) literals. See section \ref{literals} +for details. + +All literals correspond to immutable data types, and hence the +object's identity is less important than its value. Multiple +evaluations of literals with the same value (either the same +occurrence in the program text or a different occurrence) may obtain +the same object or a different object with the same value. +\indexiii{immutable}{data}{type} +\indexii{immutable}{object} + + +\subsection{Parenthesized forms\label{parenthesized}} +\index{parenthesized form} + +A parenthesized form is an optional expression list enclosed in +parentheses: + +\begin{productionlist} + \production{parenth_form} + {"(" [\token{expression_list}] ")"} +\end{productionlist} + +A parenthesized expression list yields whatever that expression list +yields: if the list contains at least one comma, it yields a tuple; +otherwise, it yields the single expression that makes up the +expression list. + +An empty pair of parentheses yields an empty tuple object. Since +tuples are immutable, the rules for literals apply (i.e., two +occurrences of the empty tuple may or may not yield the same object). +\indexii{empty}{tuple} + +Note that tuples are not formed by the parentheses, but rather by use +of the comma operator. The exception is the empty tuple, for which +parentheses \emph{are} required --- allowing unparenthesized ``nothing'' +in expressions would cause ambiguities and allow common typos to +pass uncaught. +\index{comma} +\indexii{tuple}{display} + + +\subsection{List displays\label{lists}} +\indexii{list}{display} +\indexii{list}{comprehensions} + +A list display is a possibly empty series of expressions enclosed in +square brackets: + +\begin{productionlist} + \production{list_display} + {"[" [\token{expression_list} | \token{list_comprehension}] "]"} + \production{list_comprehension} + {\token{expression} \token{list_for}} + \production{list_for} + {"for" \token{target_list} "in" \token{old_expression_list} + [\token{list_iter}]} + \production{old_expression_list} + {\token{old_expression} + [("," \token{old_expression})+ [","]]} + \production{list_iter} + {\token{list_for} | \token{list_if}} + \production{list_if} + {"if" \token{old_expression} [\token{list_iter}]} +\end{productionlist} + +A list display yields a new list object. Its contents are specified +by providing either a list of expressions or a list comprehension. +\indexii{list}{comprehensions} +When a comma-separated list of expressions is supplied, its elements are +evaluated from left to right and placed into the list object in that +order. When a list comprehension is supplied, it consists of a +single expression followed by at least one \keyword{for} clause and zero or +more \keyword{for} or \keyword{if} clauses. In this +case, the elements of the new list are those that would be produced +by considering each of the \keyword{for} or \keyword{if} clauses a block, +nesting from +left to right, and evaluating the expression to produce a list element +each time the innermost block is reached\footnote{In Python 2.3, a +list comprehension "leaks" the control variables of each +\samp{for} it contains into the containing scope. However, this +behavior is deprecated, and relying on it will not work once this +bug is fixed in a future release}. +\obindex{list} +\indexii{empty}{list} + + +\subsection{Generator expressions\label{genexpr}} +\indexii{generator}{expression} + +A generator expression is a compact generator notation in parentheses: + +\begin{productionlist} + \production{generator_expression} + {"(" \token{expression} \token{genexpr_for} ")"} + \production{genexpr_for} + {"for" \token{target_list} "in" \token{or_test} + [\token{genexpr_iter}]} + \production{genexpr_iter} + {\token{genexpr_for} | \token{genexpr_if}} + \production{genexpr_if} + {"if" \token{old_expression} [\token{genexpr_iter}]} +\end{productionlist} + +A generator expression yields a new generator object. +\obindex{generator} +It consists of a single expression followed by at least one +\keyword{for} clause and zero or more \keyword{for} or \keyword{if} +clauses. The iterating values of the new generator are those that +would be produced by considering each of the \keyword{for} or +\keyword{if} clauses a block, nesting from left to right, and +evaluating the expression to yield a value that is reached the +innermost block for each iteration. + +Variables used in the generator expression are evaluated lazily +when the \method{next()} method is called for generator object +(in the same fashion as normal generators). However, the leftmost +\keyword{for} clause is immediately evaluated so that error produced +by it can be seen before any other possible error in the code that +handles the generator expression. +Subsequent \keyword{for} clauses cannot be evaluated immediately since +they may depend on the previous \keyword{for} loop. +For example: \samp{(x*y for x in range(10) for y in bar(x))}. + +The parentheses can be omitted on calls with only one argument. +See section \ref{calls} for the detail. + + +\subsection{Dictionary displays\label{dict}} +\indexii{dictionary}{display} + +A dictionary display is a possibly empty series of key/datum pairs +enclosed in curly braces: +\index{key} +\index{datum} +\index{key/datum pair} + +\begin{productionlist} + \production{dict_display} + {"\{" [\token{key_datum_list}] "\}"} + \production{key_datum_list} + {\token{key_datum} ("," \token{key_datum})* [","]} + \production{key_datum} + {\token{expression} ":" \token{expression}} +\end{productionlist} + +A dictionary display yields a new dictionary object. +\obindex{dictionary} + +The key/datum pairs are evaluated from left to right to define the +entries of the dictionary: each key object is used as a key into the +dictionary to store the corresponding datum. + +Restrictions on the types of the key values are listed earlier in +section \ref{types}. (To summarize, the key type should be hashable, +which excludes all mutable objects.) Clashes between duplicate keys +are not detected; the last datum (textually rightmost in the display) +stored for a given key value prevails. +\indexii{immutable}{object} + + +\subsection{String conversions\label{string-conversions}} +\indexii{string}{conversion} +\indexii{reverse}{quotes} +\indexii{backward}{quotes} +\index{back-quotes} + +A string conversion is an expression list enclosed in reverse (a.k.a. +backward) quotes: + +\begin{productionlist} + \production{string_conversion} + {"`" \token{expression_list} "`"} +\end{productionlist} + +A string conversion evaluates the contained expression list and +converts the resulting object into a string according to rules +specific to its type. + +If the object is a string, a number, \code{None}, or a tuple, list or +dictionary containing only objects whose type is one of these, the +resulting string is a valid Python expression which can be passed to +the built-in function \function{eval()} to yield an expression with the +same value (or an approximation, if floating point numbers are +involved). + +(In particular, converting a string adds quotes around it and converts +``funny'' characters to escape sequences that are safe to print.) + +Recursive objects (for example, lists or dictionaries that contain a +reference to themselves, directly or indirectly) use \samp{...} to +indicate a recursive reference, and the result cannot be passed to +\function{eval()} to get an equal value (\exception{SyntaxError} will +be raised instead). +\obindex{recursive} + +The built-in function \function{repr()} performs exactly the same +conversion in its argument as enclosing it in parentheses and reverse +quotes does. The built-in function \function{str()} performs a +similar but more user-friendly conversion. +\bifuncindex{repr} +\bifuncindex{str} + + +\subsection{Yield expressions\label{yieldexpr}} +\kwindex{yield} +\indexii{yield}{expression} +\indexii{generator}{function} + +\begin{productionlist} + \production{yield_atom} + {"(" \token{yield_expression} ")"} + \production{yield_expression} + {"yield" [\token{expression_list}]} +\end{productionlist} + +\versionadded{2.5} + +The \keyword{yield} expression is only used when defining a generator +function, and can only be used in the body of a function definition. +Using a \keyword{yield} expression in a function definition is +sufficient to cause that definition to create a generator function +instead of a normal function. + +When a generator function is called, it returns an iterator known as a +generator. That generator then controls the execution of a generator +function. The execution starts when one of the generator's methods is +called. At that time, the execution proceeds to the first +\keyword{yield} expression, where it is suspended again, returning the +value of \grammartoken{expression_list} to generator's caller. By +suspended we mean that all local state is retained, including the +current bindings of local variables, the instruction pointer, and the +internal evaluation stack. When the execution is resumed by calling +one of the generator's methods, the function can proceed exactly as +if the \keyword{yield} expression was just another external call. +The value of the \keyword{yield} expression after resuming depends on +the method which resumed the execution. + +\index{coroutine} + +All of this makes generator functions quite similar to coroutines; they +yield multiple times, they have more than one entry point and their +execution can be suspended. The only difference is that a generator +function cannot control where should the execution continue after it +yields; the control is always transfered to the generator's caller. + +\obindex{generator} + +The following generator's methods can be used to control the execution +of a generator function: + +\exindex{StopIteration} + +\begin{methoddesc}[generator]{next}{} + Starts the execution of a generator function or resumes it at the + last executed \keyword{yield} expression. When a generator function + is resumed with a \method{next()} method, the current \keyword{yield} + expression always evaluates to \constant{None}. The execution then + continues to the next \keyword{yield} expression, where the generator + is suspended again, and the value of the + \grammartoken{expression_list} is returned to \method{next()}'s + caller. If the generator exits without yielding another value, a + \exception{StopIteration} exception is raised. +\end{methoddesc} + +\begin{methoddesc}[generator]{send}{value} + Resumes the execution and ``sends'' a value into the generator + function. The \code{value} argument becomes the result of the + current \keyword{yield} expression. The \method{send()} method + returns the next value yielded by the generator, or raises + \exception{StopIteration} if the generator exits without yielding + another value. + When \method{send()} is called to start the generator, it must be + called with \constant{None} as the argument, because there is no + \keyword{yield} expression that could receieve the value. +\end{methoddesc} + +\begin{methoddesc}[generator]{throw} + {type\optional{, value\optional{, traceback}}} + Raises an exception of type \code{type} at the point where generator + was paused, and returns the next value yielded by the generator + function. If the generator exits without yielding another value, a + \exception{StopIteration} exception is raised. If the generator + function does not catch the passed-in exception, or raises a + different exception, then that exception propagates to the caller. +\end{methoddesc} + +\exindex{GeneratorExit} + +\begin{methoddesc}[generator]{close}{} + Raises a \exception{GeneratorExit} at the point where the generator + function was paused. If the generator function then raises + \exception{StopIteration} (by exiting normally, or due to already + being closed) or \exception{GeneratorExit} (by not catching the + exception), close returns to its caller. If the generator yields a + value, a \exception{RuntimeError} is raised. If the generator raises + any other exception, it is propagated to the caller. \method{close} + does nothing if the generator has already exited due to an exception + or normal exit. +\end{methoddesc} + +Here is a simple example that demonstrates the behavior of generators +and generator functions: + +\begin{verbatim} +>>> def echo(value=None): +... print "Execution starts when 'next()' is called for the first time." +... try: +... while True: +... try: +... value = (yield value) +... except GeneratorExit: +... # never catch GeneratorExit +... raise +... except Exception, e: +... value = e +... finally: +... print "Don't forget to clean up when 'close()' is called." +... +>>> generator = echo(1) +>>> print generator.next() +Execution starts when 'next()' is called for the first time. +1 +>>> print generator.next() +None +>>> print generator.send(2) +2 +>>> generator.throw(TypeError, "spam") +TypeError('spam',) +>>> generator.close() +Don't forget to clean up when 'close()' is called. +\end{verbatim} + +\begin{seealso} + \seepep{0342}{Coroutines via Enhanced Generators} + {The proposal to enhance the API and syntax of generators, + making them usable as simple coroutines.} +\end{seealso} + + +\section{Primaries\label{primaries}} +\index{primary} + +Primaries represent the most tightly bound operations of the language. +Their syntax is: + +\begin{productionlist} + \production{primary} + {\token{atom} | \token{attributeref} + | \token{subscription} | \token{slicing} | \token{call}} +\end{productionlist} + + +\subsection{Attribute references\label{attribute-references}} +\indexii{attribute}{reference} + +An attribute reference is a primary followed by a period and a name: + +\begin{productionlist} + \production{attributeref} + {\token{primary} "." \token{identifier}} +\end{productionlist} + +The primary must evaluate to an object of a type that supports +attribute references, e.g., a module, list, or an instance. This +object is then asked to produce the attribute whose name is the +identifier. If this attribute is not available, the exception +\exception{AttributeError}\exindex{AttributeError} is raised. +Otherwise, the type and value of the object produced is determined by +the object. Multiple evaluations of the same attribute reference may +yield different objects. +\obindex{module} +\obindex{list} + + +\subsection{Subscriptions\label{subscriptions}} +\index{subscription} + +A subscription selects an item of a sequence (string, tuple or list) +or mapping (dictionary) object: +\obindex{sequence} +\obindex{mapping} +\obindex{string} +\obindex{tuple} +\obindex{list} +\obindex{dictionary} +\indexii{sequence}{item} + +\begin{productionlist} + \production{subscription} + {\token{primary} "[" \token{expression_list} "]"} +\end{productionlist} + +The primary must evaluate to an object of a sequence or mapping type. + +If the primary is a mapping, the expression list must evaluate to an +object whose value is one of the keys of the mapping, and the +subscription selects the value in the mapping that corresponds to that +key. (The expression list is a tuple except if it has exactly one +item.) + +If the primary is a sequence, the expression (list) must evaluate to a +plain integer. If this value is negative, the length of the sequence +is added to it (so that, e.g., \code{x[-1]} selects the last item of +\code{x}.) The resulting value must be a nonnegative integer less +than the number of items in the sequence, and the subscription selects +the item whose index is that value (counting from zero). + +A string's items are characters. A character is not a separate data +type but a string of exactly one character. +\index{character} +\indexii{string}{item} + + +\subsection{Slicings\label{slicings}} +\index{slicing} +\index{slice} + +A slicing selects a range of items in a sequence object (e.g., a +string, tuple or list). Slicings may be used as expressions or as +targets in assignment or \keyword{del} statements. The syntax for a +slicing: +\obindex{sequence} +\obindex{string} +\obindex{tuple} +\obindex{list} + +\begin{productionlist} + \production{slicing} + {\token{simple_slicing} | \token{extended_slicing}} + \production{simple_slicing} + {\token{primary} "[" \token{short_slice} "]"} + \production{extended_slicing} + {\token{primary} "[" \token{slice_list} "]" } + \production{slice_list} + {\token{slice_item} ("," \token{slice_item})* [","]} + \production{slice_item} + {\token{expression} | \token{proper_slice} | \token{ellipsis}} + \production{proper_slice} + {\token{short_slice} | \token{long_slice}} + \production{short_slice} + {[\token{lower_bound}] ":" [\token{upper_bound}]} + \production{long_slice} + {\token{short_slice} ":" [\token{stride}]} + \production{lower_bound} + {\token{expression}} + \production{upper_bound} + {\token{expression}} + \production{stride} + {\token{expression}} + \production{ellipsis} + {"..."} +\end{productionlist} + +There is ambiguity in the formal syntax here: anything that looks like +an expression list also looks like a slice list, so any subscription +can be interpreted as a slicing. Rather than further complicating the +syntax, this is disambiguated by defining that in this case the +interpretation as a subscription takes priority over the +interpretation as a slicing (this is the case if the slice list +contains no proper slice nor ellipses). Similarly, when the slice +list has exactly one short slice and no trailing comma, the +interpretation as a simple slicing takes priority over that as an +extended slicing.\indexii{extended}{slicing} + +The semantics for a simple slicing are as follows. The primary must +evaluate to a sequence object. The lower and upper bound expressions, +if present, must evaluate to plain integers; defaults are zero and the +\code{sys.maxint}, respectively. If either bound is negative, the +sequence's length is added to it. The slicing now selects all items +with index \var{k} such that +\code{\var{i} <= \var{k} < \var{j}} where \var{i} +and \var{j} are the specified lower and upper bounds. This may be an +empty sequence. It is not an error if \var{i} or \var{j} lie outside the +range of valid indexes (such items don't exist so they aren't +selected). + +The semantics for an extended slicing are as follows. The primary +must evaluate to a mapping object, and it is indexed with a key that +is constructed from the slice list, as follows. If the slice list +contains at least one comma, the key is a tuple containing the +conversion of the slice items; otherwise, the conversion of the lone +slice item is the key. The conversion of a slice item that is an +expression is that expression. The conversion of an ellipsis slice +item is the built-in \code{Ellipsis} object. The conversion of a +proper slice is a slice object (see section \ref{types}) whose +\member{start}, \member{stop} and \member{step} attributes are the +values of the expressions given as lower bound, upper bound and +stride, respectively, substituting \code{None} for missing +expressions. +\withsubitem{(slice object attribute)}{\ttindex{start} + \ttindex{stop}\ttindex{step}} + + +\subsection{Calls\label{calls}} +\index{call} + +A call calls a callable object (e.g., a function) with a possibly empty +series of arguments: +\obindex{callable} + +\begin{productionlist} + \production{call} + {\token{primary} "(" [\token{argument_list} [","]} + \productioncont{ | \token{expression} \token{genexpr_for}] ")"} + \production{argument_list} + {\token{positional_arguments} ["," \token{keyword_arguments}]} + \productioncont{ ["," "*" \token{expression}]} + \productioncont{ ["," "**" \token{expression}]} + \productioncont{| \token{keyword_arguments} ["," "*" \token{expression}]} + \productioncont{ ["," "**" \token{expression}]} + \productioncont{| "*" \token{expression} ["," "**" \token{expression}]} + \productioncont{| "**" \token{expression}} + \production{positional_arguments} + {\token{expression} ("," \token{expression})*} + \production{keyword_arguments} + {\token{keyword_item} ("," \token{keyword_item})*} + \production{keyword_item} + {\token{identifier} "=" \token{expression}} +\end{productionlist} + +A trailing comma may be present after the positional and keyword +arguments but does not affect the semantics. + +The primary must evaluate to a callable object (user-defined +functions, built-in functions, methods of built-in objects, class +objects, methods of class instances, and certain class instances +themselves are callable; extensions may define additional callable +object types). All argument expressions are evaluated before the call +is attempted. Please refer to section \ref{function} for the syntax +of formal parameter lists. + +If keyword arguments are present, they are first converted to +positional arguments, as follows. First, a list of unfilled slots is +created for the formal parameters. If there are N positional +arguments, they are placed in the first N slots. Next, for each +keyword argument, the identifier is used to determine the +corresponding slot (if the identifier is the same as the first formal +parameter name, the first slot is used, and so on). If the slot is +already filled, a \exception{TypeError} exception is raised. +Otherwise, the value of the argument is placed in the slot, filling it +(even if the expression is \code{None}, it fills the slot). When all +arguments have been processed, the slots that are still unfilled are +filled with the corresponding default value from the function +definition. (Default values are calculated, once, when the function +is defined; thus, a mutable object such as a list or dictionary used +as default value will be shared by all calls that don't specify an +argument value for the corresponding slot; this should usually be +avoided.) If there are any unfilled slots for which no default value +is specified, a \exception{TypeError} exception is raised. Otherwise, +the list of filled slots is used as the argument list for the call. + +If there are more positional arguments than there are formal parameter +slots, a \exception{TypeError} exception is raised, unless a formal +parameter using the syntax \samp{*identifier} is present; in this +case, that formal parameter receives a tuple containing the excess +positional arguments (or an empty tuple if there were no excess +positional arguments). + +If any keyword argument does not correspond to a formal parameter +name, a \exception{TypeError} exception is raised, unless a formal +parameter using the syntax \samp{**identifier} is present; in this +case, that formal parameter receives a dictionary containing the +excess keyword arguments (using the keywords as keys and the argument +values as corresponding values), or a (new) empty dictionary if there +were no excess keyword arguments. + +If the syntax \samp{*expression} appears in the function call, +\samp{expression} must evaluate to a sequence. Elements from this +sequence are treated as if they were additional positional arguments; +if there are postional arguments \var{x1},...,\var{xN} , and +\samp{expression} evaluates to a sequence \var{y1},...,\var{yM}, this +is equivalent to a call with M+N positional arguments +\var{x1},...,\var{xN},\var{y1},...,\var{yM}. + +A consequence of this is that although the \samp{*expression} syntax +appears \emph{after} any keyword arguments, it is processed +\emph{before} the keyword arguments (and the +\samp{**expression} argument, if any -- see below). So: + +\begin{verbatim} +>>> def f(a, b): +... print a, b +... +>>> f(b=1, *(2,)) +2 1 +>>> f(a=1, *(2,)) +Traceback (most recent call last): + File "<stdin>", line 1, in ? +TypeError: f() got multiple values for keyword argument 'a' +>>> f(1, *(2,)) +1 2 +\end{verbatim} + +It is unusual for both keyword arguments and the +\samp{*expression} syntax to be used in the same call, so in practice +this confusion does not arise. + +If the syntax \samp{**expression} appears in the function call, +\samp{expression} must evaluate to a (subclass of) dictionary, the +contents of which are treated as additional keyword arguments. In the +case of a keyword appearing in both \samp{expression} and as an +explicit keyword argument, a \exception{TypeError} exception is +raised. + +Formal parameters using the syntax \samp{*identifier} or +\samp{**identifier} cannot be used as positional argument slots or +as keyword argument names. Formal parameters using the syntax +\samp{(sublist)} cannot be used as keyword argument names; the +outermost sublist corresponds to a single unnamed argument slot, and +the argument value is assigned to the sublist using the usual tuple +assignment rules after all other parameter processing is done. + +A call always returns some value, possibly \code{None}, unless it +raises an exception. How this value is computed depends on the type +of the callable object. + +If it is--- + +\begin{description} + +\item[a user-defined function:] The code block for the function is +executed, passing it the argument list. The first thing the code +block will do is bind the formal parameters to the arguments; this is +described in section \ref{function}. When the code block executes a +\keyword{return} statement, this specifies the return value of the +function call. +\indexii{function}{call} +\indexiii{user-defined}{function}{call} +\obindex{user-defined function} +\obindex{function} + +\item[a built-in function or method:] The result is up to the +interpreter; see the \citetitle[../lib/built-in-funcs.html]{Python +Library Reference} for the descriptions of built-in functions and +methods. +\indexii{function}{call} +\indexii{built-in function}{call} +\indexii{method}{call} +\indexii{built-in method}{call} +\obindex{built-in method} +\obindex{built-in function} +\obindex{method} +\obindex{function} + +\item[a class object:] A new instance of that class is returned. +\obindex{class} +\indexii{class object}{call} + +\item[a class instance method:] The corresponding user-defined +function is called, with an argument list that is one longer than the +argument list of the call: the instance becomes the first argument. +\obindex{class instance} +\obindex{instance} +\indexii{class instance}{call} + +\item[a class instance:] The class must define a \method{__call__()} +method; the effect is then the same as if that method was called. +\indexii{instance}{call} +\withsubitem{(object method)}{\ttindex{__call__()}} + +\end{description} + + +\section{The power operator\label{power}} + +The power operator binds more tightly than unary operators on its +left; it binds less tightly than unary operators on its right. The +syntax is: + +\begin{productionlist} + \production{power} + {\token{primary} ["**" \token{u_expr}]} +\end{productionlist} + +Thus, in an unparenthesized sequence of power and unary operators, the +operators are evaluated from right to left (this does not constrain +the evaluation order for the operands). + +The power operator has the same semantics as the built-in +\function{pow()} function, when called with two arguments: it yields +its left argument raised to the power of its right argument. The +numeric arguments are first converted to a common type. The result +type is that of the arguments after coercion. + +With mixed operand types, the coercion rules for binary arithmetic +operators apply. For int and long int operands, the result has the +same type as the operands (after coercion) unless the second argument +is negative; in that case, all arguments are converted to float and a +float result is delivered. For example, \code{10**2} returns \code{100}, +but \code{10**-2} returns \code{0.01}. (This last feature was added in +Python 2.2. In Python 2.1 and before, if both arguments were of integer +types and the second argument was negative, an exception was raised). + +Raising \code{0.0} to a negative power results in a +\exception{ZeroDivisionError}. Raising a negative number to a +fractional power results in a \exception{ValueError}. + + +\section{Unary arithmetic operations \label{unary}} +\indexiii{unary}{arithmetic}{operation} +\indexiii{unary}{bit-wise}{operation} + +All unary arithmetic (and bit-wise) operations have the same priority: + +\begin{productionlist} + \production{u_expr} + {\token{power} | "-" \token{u_expr} + | "+" \token{u_expr} | "{\~}" \token{u_expr}} +\end{productionlist} + +The unary \code{-} (minus) operator yields the negation of its +numeric argument. +\index{negation} +\index{minus} + +The unary \code{+} (plus) operator yields its numeric argument +unchanged. +\index{plus} + +The unary \code{\~} (invert) operator yields the bit-wise inversion +of its plain or long integer argument. The bit-wise inversion of +\code{x} is defined as \code{-(x+1)}. It only applies to integral +numbers. +\index{inversion} + +In all three cases, if the argument does not have the proper type, +a \exception{TypeError} exception is raised. +\exindex{TypeError} + + +\section{Binary arithmetic operations\label{binary}} +\indexiii{binary}{arithmetic}{operation} + +The binary arithmetic operations have the conventional priority +levels. Note that some of these operations also apply to certain +non-numeric types. Apart from the power operator, there are only two +levels, one for multiplicative operators and one for additive +operators: + +\begin{productionlist} + \production{m_expr} + {\token{u_expr} | \token{m_expr} "*" \token{u_expr} + | \token{m_expr} "//" \token{u_expr} + | \token{m_expr} "/" \token{u_expr}} + \productioncont{| \token{m_expr} "\%" \token{u_expr}} + \production{a_expr} + {\token{m_expr} | \token{a_expr} "+" \token{m_expr} + | \token{a_expr} "-" \token{m_expr}} +\end{productionlist} + +The \code{*} (multiplication) operator yields the product of its +arguments. The arguments must either both be numbers, or one argument +must be an integer (plain or long) and the other must be a sequence. +In the former case, the numbers are converted to a common type and +then multiplied together. In the latter case, sequence repetition is +performed; a negative repetition factor yields an empty sequence. +\index{multiplication} + +The \code{/} (division) and \code{//} (floor division) operators yield +the quotient of their arguments. The numeric arguments are first +converted to a common type. Plain or long integer division yields an +integer of the same type; the result is that of mathematical division +with the `floor' function applied to the result. Division by zero +raises the +\exception{ZeroDivisionError} exception. +\exindex{ZeroDivisionError} +\index{division} + +The \code{\%} (modulo) operator yields the remainder from the +division of the first argument by the second. The numeric arguments +are first converted to a common type. A zero right argument raises +the \exception{ZeroDivisionError} exception. The arguments may be floating +point numbers, e.g., \code{3.14\%0.7} equals \code{0.34} (since +\code{3.14} equals \code{4*0.7 + 0.34}.) The modulo operator always +yields a result with the same sign as its second operand (or zero); +the absolute value of the result is strictly smaller than the absolute +value of the second operand\footnote{ + While \code{abs(x\%y) < abs(y)} is true mathematically, for + floats it may not be true numerically due to roundoff. For + example, and assuming a platform on which a Python float is an + IEEE 754 double-precision number, in order that \code{-1e-100 \% 1e100} + have the same sign as \code{1e100}, the computed result is + \code{-1e-100 + 1e100}, which is numerically exactly equal + to \code{1e100}. Function \function{fmod()} in the \module{math} + module returns a result whose sign matches the sign of the + first argument instead, and so returns \code{-1e-100} in this case. + Which approach is more appropriate depends on the application. +}. +\index{modulo} + +The integer division and modulo operators are connected by the +following identity: \code{x == (x/y)*y + (x\%y)}. Integer division and +modulo are also connected with the built-in function \function{divmod()}: +\code{divmod(x, y) == (x/y, x\%y)}. These identities don't hold for +floating point numbers; there similar identities hold +approximately where \code{x/y} is replaced by \code{floor(x/y)} or +\code{floor(x/y) - 1}\footnote{ + If x is very close to an exact integer multiple of y, it's + possible for \code{floor(x/y)} to be one larger than + \code{(x-x\%y)/y} due to rounding. In such cases, Python returns + the latter result, in order to preserve that \code{divmod(x,y)[0] + * y + x \%{} y} be very close to \code{x}. +}. + +In addition to performing the modulo operation on numbers, the \code{\%} +operator is also overloaded by string and unicode objects to perform +string formatting (also known as interpolation). The syntax for string +formatting is described in the +\citetitle[../lib/typesseq-strings.html]{Python Library Reference}, +section ``Sequence Types''. + +\deprecated{2.3}{The floor division operator, the modulo operator, +and the \function{divmod()} function are no longer defined for complex +numbers. Instead, convert to a floating point number using the +\function{abs()} function if appropriate.} + +The \code{+} (addition) operator yields the sum of its arguments. +The arguments must either both be numbers or both sequences of the +same type. In the former case, the numbers are converted to a common +type and then added together. In the latter case, the sequences are +concatenated. +\index{addition} + +The \code{-} (subtraction) operator yields the difference of its +arguments. The numeric arguments are first converted to a common +type. +\index{subtraction} + + +\section{Shifting operations\label{shifting}} +\indexii{shifting}{operation} + +The shifting operations have lower priority than the arithmetic +operations: + +\begin{productionlist} + \production{shift_expr} + {\token{a_expr} + | \token{shift_expr} ( "<<" | ">>" ) \token{a_expr}} +\end{productionlist} + +These operators accept plain or long integers as arguments. The +arguments are converted to a common type. They shift the first +argument to the left or right by the number of bits given by the +second argument. + +A right shift by \var{n} bits is defined as division by +\code{pow(2,\var{n})}. A left shift by \var{n} bits is defined as +multiplication with \code{pow(2,\var{n})}; for plain integers there is +no overflow check so in that case the operation drops bits and flips +the sign if the result is not less than \code{pow(2,31)} in absolute +value. Negative shift counts raise a \exception{ValueError} +exception. +\exindex{ValueError} + + +\section{Binary bit-wise operations\label{bitwise}} +\indexiii{binary}{bit-wise}{operation} + +Each of the three bitwise operations has a different priority level: + +\begin{productionlist} + \production{and_expr} + {\token{shift_expr} | \token{and_expr} "\&" \token{shift_expr}} + \production{xor_expr} + {\token{and_expr} | \token{xor_expr} "\textasciicircum" \token{and_expr}} + \production{or_expr} + {\token{xor_expr} | \token{or_expr} "|" \token{xor_expr}} +\end{productionlist} + +The \code{\&} operator yields the bitwise AND of its arguments, which +must be plain or long integers. The arguments are converted to a +common type. +\indexii{bit-wise}{and} + +The \code{\^} operator yields the bitwise XOR (exclusive OR) of its +arguments, which must be plain or long integers. The arguments are +converted to a common type. +\indexii{bit-wise}{xor} +\indexii{exclusive}{or} + +The \code{|} operator yields the bitwise (inclusive) OR of its +arguments, which must be plain or long integers. The arguments are +converted to a common type. +\indexii{bit-wise}{or} +\indexii{inclusive}{or} + + +\section{Comparisons\label{comparisons}} +\index{comparison} + +Unlike C, all comparison operations in Python have the same priority, +which is lower than that of any arithmetic, shifting or bitwise +operation. Also unlike C, expressions like \code{a < b < c} have the +interpretation that is conventional in mathematics: +\indexii{C}{language} + +\begin{productionlist} + \production{comparison} + {\token{or_expr} ( \token{comp_operator} \token{or_expr} )*} + \production{comp_operator} + {"<" | ">" | "==" | ">=" | "<=" | "<>" | "!="} + \productioncont{| "is" ["not"] | ["not"] "in"} +\end{productionlist} + +Comparisons yield boolean values: \code{True} or \code{False}. + +Comparisons can be chained arbitrarily, e.g., \code{x < y <= z} is +equivalent to \code{x < y and y <= z}, except that \code{y} is +evaluated only once (but in both cases \code{z} is not evaluated at all +when \code{x < y} is found to be false). +\indexii{chaining}{comparisons} + +Formally, if \var{a}, \var{b}, \var{c}, \ldots, \var{y}, \var{z} are +expressions and \var{opa}, \var{opb}, \ldots, \var{opy} are comparison +operators, then \var{a opa b opb c} \ldots \var{y opy z} is equivalent +to \var{a opa b} \keyword{and} \var{b opb c} \keyword{and} \ldots +\var{y opy z}, except that each expression is evaluated at most once. + +Note that \var{a opa b opb c} doesn't imply any kind of comparison +between \var{a} and \var{c}, so that, e.g., \code{x < y > z} is +perfectly legal (though perhaps not pretty). + +The forms \code{<>} and \code{!=} are equivalent; for consistency with +C, \code{!=} is preferred; where \code{!=} is mentioned below +\code{<>} is also accepted. The \code{<>} spelling is considered +obsolescent. + +The operators \code{<}, \code{>}, \code{==}, \code{>=}, \code{<=}, and +\code{!=} compare +the values of two objects. The objects need not have the same type. +If both are numbers, they are converted to a common type. Otherwise, +objects of different types \emph{always} compare unequal, and are +ordered consistently but arbitrarily. You can control comparison +behavior of objects of non-builtin types by defining a \code{__cmp__} +method or rich comparison methods like \code{__gt__}, described in +section~\ref{specialnames}. + +(This unusual definition of comparison was used to simplify the +definition of operations like sorting and the \keyword{in} and +\keyword{not in} operators. In the future, the comparison rules for +objects of different types are likely to change.) + +Comparison of objects of the same type depends on the type: + +\begin{itemize} + +\item +Numbers are compared arithmetically. + +\item +Strings are compared lexicographically using the numeric equivalents +(the result of the built-in function \function{ord()}) of their +characters. Unicode and 8-bit strings are fully interoperable in this +behavior. + +\item +Tuples and lists are compared lexicographically using comparison of +corresponding elements. This means that to compare equal, each +element must compare equal and the two sequences must be of the same +type and have the same length. + +If not equal, the sequences are ordered the same as their first +differing elements. For example, \code{cmp([1,2,x], [1,2,y])} returns +the same as \code{cmp(x,y)}. If the corresponding element does not +exist, the shorter sequence is ordered first (for example, +\code{[1,2] < [1,2,3]}). + +\item +Mappings (dictionaries) compare equal if and only if their sorted +(key, value) lists compare equal.\footnote{The implementation computes + this efficiently, without constructing lists or sorting.} +Outcomes other than equality are resolved consistently, but are not +otherwise defined.\footnote{Earlier versions of Python used + lexicographic comparison of the sorted (key, value) lists, but this + was very expensive for the common case of comparing for equality. An + even earlier version of Python compared dictionaries by identity only, + but this caused surprises because people expected to be able to test + a dictionary for emptiness by comparing it to \code{\{\}}.} + +\item +Most other objects of builtin types compare unequal unless they are +the same object; +the choice whether one object is considered smaller or larger than +another one is made arbitrarily but consistently within one +execution of a program. + +\end{itemize} + +The operators \keyword{in} and \keyword{not in} test for set +membership. \code{\var{x} in \var{s}} evaluates to true if \var{x} +is a member of the set \var{s}, and false otherwise. \code{\var{x} +not in \var{s}} returns the negation of \code{\var{x} in \var{s}}. +The set membership test has traditionally been bound to sequences; an +object is a member of a set if the set is a sequence and contains an +element equal to that object. However, it is possible for an object +to support membership tests without being a sequence. In particular, +dictionaries support membership testing as a nicer way of spelling +\code{\var{key} in \var{dict}}; other mapping types may follow suit. + +For the list and tuple types, \code{\var{x} in \var{y}} is true if and +only if there exists an index \var{i} such that +\code{\var{x} == \var{y}[\var{i}]} is true. + +For the Unicode and string types, \code{\var{x} in \var{y}} is true if +and only if \var{x} is a substring of \var{y}. An equivalent test is +\code{y.find(x) != -1}. Note, \var{x} and \var{y} need not be the +same type; consequently, \code{u'ab' in 'abc'} will return \code{True}. +Empty strings are always considered to be a substring of any other string, +so \code{"" in "abc"} will return \code{True}. +\versionchanged[Previously, \var{x} was required to be a string of +length \code{1}]{2.3} + +For user-defined classes which define the \method{__contains__()} method, +\code{\var{x} in \var{y}} is true if and only if +\code{\var{y}.__contains__(\var{x})} is true. + +For user-defined classes which do not define \method{__contains__()} and +do define \method{__getitem__()}, \code{\var{x} in \var{y}} is true if +and only if there is a non-negative integer index \var{i} such that +\code{\var{x} == \var{y}[\var{i}]}, and all lower integer indices +do not raise \exception{IndexError} exception. (If any other exception +is raised, it is as if \keyword{in} raised that exception). + +The operator \keyword{not in} is defined to have the inverse true value +of \keyword{in}. +\opindex{in} +\opindex{not in} +\indexii{membership}{test} +\obindex{sequence} + +The operators \keyword{is} and \keyword{is not} test for object identity: +\code{\var{x} is \var{y}} is true if and only if \var{x} and \var{y} +are the same object. \code{\var{x} is not \var{y}} yields the inverse +truth value. +\opindex{is} +\opindex{is not} +\indexii{identity}{test} + + +\section{Boolean operations\label{Booleans}} +\indexii{Conditional}{expression} +\indexii{Boolean}{operation} + +Boolean operations have the lowest priority of all Python operations: + +\begin{productionlist} + \production{expression} + {\token{conditional_expression} | \token{lambda_form}} + \production{old_expression} + {\token{or_test} | \token{old_lambda_form}} + \production{conditional_expression} + {\token{or_test} ["if" \token{or_test} "else" \token{expression}]} + \production{or_test} + {\token{and_test} | \token{or_test} "or" \token{and_test}} + \production{and_test} + {\token{not_test} | \token{and_test} "and" \token{not_test}} + \production{not_test} + {\token{comparison} | "not" \token{not_test}} +\end{productionlist} + +In the context of Boolean operations, and also when expressions are +used by control flow statements, the following values are interpreted +as false: \code{False}, \code{None}, numeric zero of all types, and empty +strings and containers (including strings, tuples, lists, dictionaries, +sets and frozensets). All other values are interpreted as true. + +The operator \keyword{not} yields \code{True} if its argument is false, +\code{False} otherwise. +\opindex{not} + +The expression \code{\var{x} if \var{C} else \var{y}} first evaluates +\var{C} (\emph{not} \var{x}); if \var{C} is true, \var{x} is evaluated and +its value is returned; otherwise, \var{y} is evaluated and its value is +returned. \versionadded{2.5} + +The expression \code{\var{x} and \var{y}} first evaluates \var{x}; if +\var{x} is false, its value is returned; otherwise, \var{y} is +evaluated and the resulting value is returned. +\opindex{and} + +The expression \code{\var{x} or \var{y}} first evaluates \var{x}; if +\var{x} is true, its value is returned; otherwise, \var{y} is +evaluated and the resulting value is returned. +\opindex{or} + +(Note that neither \keyword{and} nor \keyword{or} restrict the value +and type they return to \code{False} and \code{True}, but rather return the +last evaluated argument. +This is sometimes useful, e.g., if \code{s} is a string that should be +replaced by a default value if it is empty, the expression +\code{s or 'foo'} yields the desired value. Because \keyword{not} has to +invent a value anyway, it does not bother to return a value of the +same type as its argument, so e.g., \code{not 'foo'} yields \code{False}, +not \code{''}.) + +\section{Lambdas\label{lambdas}} +\indexii{lambda}{expression} +\indexii{lambda}{form} +\indexii{anonymous}{function} + +\begin{productionlist} + \production{lambda_form} + {"lambda" [\token{parameter_list}]: \token{expression}} + \production{old_lambda_form} + {"lambda" [\token{parameter_list}]: \token{old_expression}} +\end{productionlist} + +Lambda forms (lambda expressions) have the same syntactic position as +expressions. They are a shorthand to create anonymous functions; the +expression \code{lambda \var{arguments}: \var{expression}} +yields a function object. The unnamed object behaves like a function +object defined with + +\begin{verbatim} +def name(arguments): + return expression +\end{verbatim} + +See section \ref{function} for the syntax of parameter lists. Note +that functions created with lambda forms cannot contain statements. +\label{lambda} + +\section{Expression lists\label{exprlists}} +\indexii{expression}{list} + +\begin{productionlist} + \production{expression_list} + {\token{expression} ( "," \token{expression} )* [","]} +\end{productionlist} + +An expression list containing at least one comma yields a +tuple. The length of the tuple is the number of expressions in the +list. The expressions are evaluated from left to right. +\obindex{tuple} + +The trailing comma is required only to create a single tuple (a.k.a. a +\emph{singleton}); it is optional in all other cases. A single +expression without a trailing comma doesn't create a +tuple, but rather yields the value of that expression. +(To create an empty tuple, use an empty pair of parentheses: +\code{()}.) +\indexii{trailing}{comma} + +\section{Evaluation order\label{evalorder}} +\indexii{evaluation}{order} + +Python evaluates expressions from left to right. Notice that while +evaluating an assignment, the right-hand side is evaluated before +the left-hand side. + +In the following lines, expressions will be evaluated in the +arithmetic order of their suffixes: + +\begin{verbatim} +expr1, expr2, expr3, expr4 +(expr1, expr2, expr3, expr4) +{expr1: expr2, expr3: expr4} +expr1 + expr2 * (expr3 - expr4) +func(expr1, expr2, *expr3, **expr4) +expr3, expr4 = expr1, expr2 +\end{verbatim} + +\section{Summary\label{summary}} + +The following table summarizes the operator +precedences\indexii{operator}{precedence} in Python, from lowest +precedence (least binding) to highest precedence (most binding). +Operators in the same box have the same precedence. Unless the syntax +is explicitly given, operators are binary. Operators in the same box +group left to right (except for comparisons, including tests, which all +have the same precedence and chain from left to right --- see section +\ref{comparisons} -- and exponentiation, which groups from right to left). + +\begin{tableii}{c|l}{textrm}{Operator}{Description} + \lineii{\keyword{lambda}} {Lambda expression} + \hline + \lineii{\keyword{or}} {Boolean OR} + \hline + \lineii{\keyword{and}} {Boolean AND} + \hline + \lineii{\keyword{not} \var{x}} {Boolean NOT} + \hline + \lineii{\keyword{in}, \keyword{not} \keyword{in}}{Membership tests} + \lineii{\keyword{is}, \keyword{is not}}{Identity tests} + \lineii{\code{<}, \code{<=}, \code{>}, \code{>=}, + \code{<>}, \code{!=}, \code{==}} + {Comparisons} + \hline + \lineii{\code{|}} {Bitwise OR} + \hline + \lineii{\code{\^}} {Bitwise XOR} + \hline + \lineii{\code{\&}} {Bitwise AND} + \hline + \lineii{\code{<<}, \code{>>}} {Shifts} + \hline + \lineii{\code{+}, \code{-}}{Addition and subtraction} + \hline + \lineii{\code{*}, \code{/}, \code{\%}} + {Multiplication, division, remainder} + \hline + \lineii{\code{+\var{x}}, \code{-\var{x}}} {Positive, negative} + \lineii{\code{\~\var{x}}} {Bitwise not} + \hline + \lineii{\code{**}} {Exponentiation} + \hline + \lineii{\code{\var{x}.\var{attribute}}} {Attribute reference} + \lineii{\code{\var{x}[\var{index}]}} {Subscription} + \lineii{\code{\var{x}[\var{index}:\var{index}]}} {Slicing} + \lineii{\code{\var{f}(\var{arguments}...)}} {Function call} + \hline + \lineii{\code{(\var{expressions}\ldots)}} {Binding or tuple display} + \lineii{\code{[\var{expressions}\ldots]}} {List display} + \lineii{\code{\{\var{key}:\var{datum}\ldots\}}}{Dictionary display} + \lineii{\code{`\var{expressions}\ldots`}} {String conversion} +\end{tableii} diff --git a/sys/src/cmd/python/Doc/ref/ref6.tex b/sys/src/cmd/python/Doc/ref/ref6.tex new file mode 100644 index 000000000..1fc885ed1 --- /dev/null +++ b/sys/src/cmd/python/Doc/ref/ref6.tex @@ -0,0 +1,928 @@ +\chapter{Simple statements \label{simple}} +\indexii{simple}{statement} + +Simple statements are comprised within a single logical line. +Several simple statements may occur on a single line separated +by semicolons. The syntax for simple statements is: + +\begin{productionlist} + \production{simple_stmt}{\token{expression_stmt}} + \productioncont{| \token{assert_stmt}} + \productioncont{| \token{assignment_stmt}} + \productioncont{| \token{augmented_assignment_stmt}} + \productioncont{| \token{pass_stmt}} + \productioncont{| \token{del_stmt}} + \productioncont{| \token{print_stmt}} + \productioncont{| \token{return_stmt}} + \productioncont{| \token{yield_stmt}} + \productioncont{| \token{raise_stmt}} + \productioncont{| \token{break_stmt}} + \productioncont{| \token{continue_stmt}} + \productioncont{| \token{import_stmt}} + \productioncont{| \token{global_stmt}} + \productioncont{| \token{exec_stmt}} +\end{productionlist} + + +\section{Expression statements \label{exprstmts}} +\indexii{expression}{statement} + +Expression statements are used (mostly interactively) to compute and +write a value, or (usually) to call a procedure (a function that +returns no meaningful result; in Python, procedures return the value +\code{None}). Other uses of expression statements are allowed and +occasionally useful. The syntax for an expression statement is: + +\begin{productionlist} + \production{expression_stmt} + {\token{expression_list}} +\end{productionlist} + +An expression statement evaluates the expression list (which may be a +single expression). +\indexii{expression}{list} + +In interactive mode, if the value is not \code{None}, it is converted +to a string using the built-in \function{repr()}\bifuncindex{repr} +function and the resulting string is written to standard output (see +section~\ref{print}) on a line by itself. (Expression statements +yielding \code{None} are not written, so that procedure calls do not +cause any output.) +\obindex{None} +\indexii{string}{conversion} +\index{output} +\indexii{standard}{output} +\indexii{writing}{values} +\indexii{procedure}{call} + + +\section{Assert statements \label{assert}} + +Assert statements\stindex{assert} are a convenient way to insert +debugging assertions\indexii{debugging}{assertions} into a program: + +\begin{productionlist} + \production{assert_stmt} + {"assert" \token{expression} ["," \token{expression}]} +\end{productionlist} + +The simple form, \samp{assert expression}, is equivalent to + +\begin{verbatim} +if __debug__: + if not expression: raise AssertionError +\end{verbatim} + +The extended form, \samp{assert expression1, expression2}, is +equivalent to + +\begin{verbatim} +if __debug__: + if not expression1: raise AssertionError, expression2 +\end{verbatim} + +These equivalences assume that \code{__debug__}\ttindex{__debug__} and +\exception{AssertionError}\exindex{AssertionError} refer to the built-in +variables with those names. In the current implementation, the +built-in variable \code{__debug__} is \code{True} under normal +circumstances, \code{False} when optimization is requested (command line +option -O). The current code generator emits no code for an assert +statement when optimization is requested at compile time. Note that it +is unnecessary to include the source code for the expression that failed +in the error message; +it will be displayed as part of the stack trace. + +Assignments to \code{__debug__} are illegal. The value for the +built-in variable is determined when the interpreter starts. + + +\section{Assignment statements \label{assignment}} + +Assignment statements\indexii{assignment}{statement} are used to +(re)bind names to values and to modify attributes or items of mutable +objects: +\indexii{binding}{name} +\indexii{rebinding}{name} +\obindex{mutable} +\indexii{attribute}{assignment} + +\begin{productionlist} + \production{assignment_stmt} + {(\token{target_list} "=")+ + (\token{expression_list} | \token{yield_expression})} + \production{target_list} + {\token{target} ("," \token{target})* [","]} + \production{target} + {\token{identifier}} + \productioncont{| "(" \token{target_list} ")"} + \productioncont{| "[" \token{target_list} "]"} + \productioncont{| \token{attributeref}} + \productioncont{| \token{subscription}} + \productioncont{| \token{slicing}} +\end{productionlist} + +(See section~\ref{primaries} for the syntax definitions for the last +three symbols.) + +An assignment statement evaluates the expression list (remember that +this can be a single expression or a comma-separated list, the latter +yielding a tuple) and assigns the single resulting object to each of +the target lists, from left to right. +\indexii{expression}{list} + +Assignment is defined recursively depending on the form of the target +(list). When a target is part of a mutable object (an attribute +reference, subscription or slicing), the mutable object must +ultimately perform the assignment and decide about its validity, and +may raise an exception if the assignment is unacceptable. The rules +observed by various types and the exceptions raised are given with the +definition of the object types (see section~\ref{types}). +\index{target} +\indexii{target}{list} + +Assignment of an object to a target list is recursively defined as +follows. +\indexiii{target}{list}{assignment} + +\begin{itemize} +\item +If the target list is a single target: The object is assigned to that +target. + +\item +If the target list is a comma-separated list of targets: The object +must be a sequence with the same number of items as there are +targets in the target list, and the items are assigned, from left to +right, to the corresponding targets. (This rule is relaxed as of +Python 1.5; in earlier versions, the object had to be a tuple. Since +strings are sequences, an assignment like \samp{a, b = "xy"} is +now legal as long as the string has the right length.) + +\end{itemize} + +Assignment of an object to a single target is recursively defined as +follows. + +\begin{itemize} % nested + +\item +If the target is an identifier (name): + +\begin{itemize} + +\item +If the name does not occur in a \keyword{global} statement in the current +code block: the name is bound to the object in the current local +namespace. +\stindex{global} + +\item +Otherwise: the name is bound to the object in the current global +namespace. + +\end{itemize} % nested + +The name is rebound if it was already bound. This may cause the +reference count for the object previously bound to the name to reach +zero, causing the object to be deallocated and its +destructor\index{destructor} (if it has one) to be called. + +\item +If the target is a target list enclosed in parentheses or in square +brackets: The object must be a sequence with the same number of items +as there are targets in the target list, and its items are assigned, +from left to right, to the corresponding targets. + +\item +If the target is an attribute reference: The primary expression in the +reference is evaluated. It should yield an object with assignable +attributes; if this is not the case, \exception{TypeError} is raised. That +object is then asked to assign the assigned object to the given +attribute; if it cannot perform the assignment, it raises an exception +(usually but not necessarily \exception{AttributeError}). +\indexii{attribute}{assignment} + +\item +If the target is a subscription: The primary expression in the +reference is evaluated. It should yield either a mutable sequence +object (such as a list) or a mapping object (such as a dictionary). Next, +the subscript expression is evaluated. +\indexii{subscription}{assignment} +\obindex{mutable} + +If the primary is a mutable sequence object (such as a list), the subscript +must yield a plain integer. If it is negative, the sequence's length +is added to it. The resulting value must be a nonnegative integer +less than the sequence's length, and the sequence is asked to assign +the assigned object to its item with that index. If the index is out +of range, \exception{IndexError} is raised (assignment to a subscripted +sequence cannot add new items to a list). +\obindex{sequence} +\obindex{list} + +If the primary is a mapping object (such as a dictionary), the subscript must +have a type compatible with the mapping's key type, and the mapping is +then asked to create a key/datum pair which maps the subscript to +the assigned object. This can either replace an existing key/value +pair with the same key value, or insert a new key/value pair (if no +key with the same value existed). +\obindex{mapping} +\obindex{dictionary} + +\item +If the target is a slicing: The primary expression in the reference is +evaluated. It should yield a mutable sequence object (such as a list). The +assigned object should be a sequence object of the same type. Next, +the lower and upper bound expressions are evaluated, insofar they are +present; defaults are zero and the sequence's length. The bounds +should evaluate to (small) integers. If either bound is negative, the +sequence's length is added to it. The resulting bounds are clipped to +lie between zero and the sequence's length, inclusive. Finally, the +sequence object is asked to replace the slice with the items of the +assigned sequence. The length of the slice may be different from the +length of the assigned sequence, thus changing the length of the +target sequence, if the object allows it. +\indexii{slicing}{assignment} + +\end{itemize} + +(In the current implementation, the syntax for targets is taken +to be the same as for expressions, and invalid syntax is rejected +during the code generation phase, causing less detailed error +messages.) + +WARNING: Although the definition of assignment implies that overlaps +between the left-hand side and the right-hand side are `safe' (for example +\samp{a, b = b, a} swaps two variables), overlaps \emph{within} the +collection of assigned-to variables are not safe! For instance, the +following program prints \samp{[0, 2]}: + +\begin{verbatim} +x = [0, 1] +i = 0 +i, x[i] = 1, 2 +print x +\end{verbatim} + + +\subsection{Augmented assignment statements \label{augassign}} + +Augmented assignment is the combination, in a single statement, of a binary +operation and an assignment statement: +\indexii{augmented}{assignment} +\index{statement!assignment, augmented} + +\begin{productionlist} + \production{augmented_assignment_stmt} + {\token{target} \token{augop} + (\token{expression_list} | \token{yield_expression})} + \production{augop} + {"+=" | "-=" | "*=" | "/=" | "\%=" | "**="} + \productioncont{| ">>=" | "<<=" | "\&=" | "\textasciicircum=" | "|="} +\end{productionlist} + +(See section~\ref{primaries} for the syntax definitions for the last +three symbols.) + +An augmented assignment evaluates the target (which, unlike normal +assignment statements, cannot be an unpacking) and the expression +list, performs the binary operation specific to the type of assignment +on the two operands, and assigns the result to the original +target. The target is only evaluated once. + +An augmented assignment expression like \code{x += 1} can be rewritten as +\code{x = x + 1} to achieve a similar, but not exactly equal effect. In the +augmented version, \code{x} is only evaluated once. Also, when possible, the +actual operation is performed \emph{in-place}, meaning that rather than +creating a new object and assigning that to the target, the old object is +modified instead. + +With the exception of assigning to tuples and multiple targets in a single +statement, the assignment done by augmented assignment statements is handled +the same way as normal assignments. Similarly, with the exception of the +possible \emph{in-place} behavior, the binary operation performed by +augmented assignment is the same as the normal binary operations. + +For targets which are attribute references, the initial value is +retrieved with a \method{getattr()} and the result is assigned with a +\method{setattr()}. Notice that the two methods do not necessarily +refer to the same variable. When \method{getattr()} refers to a class +variable, \method{setattr()} still writes to an instance variable. +For example: + +\begin{verbatim} +class A: + x = 3 # class variable +a = A() +a.x += 1 # writes a.x as 4 leaving A.x as 3 +\end{verbatim} + + +\section{The \keyword{pass} statement \label{pass}} +\stindex{pass} + +\begin{productionlist} + \production{pass_stmt} + {"pass"} +\end{productionlist} + +\keyword{pass} is a null operation --- when it is executed, nothing +happens. It is useful as a placeholder when a statement is +required syntactically, but no code needs to be executed, for example: +\indexii{null}{operation} + +\begin{verbatim} +def f(arg): pass # a function that does nothing (yet) + +class C: pass # a class with no methods (yet) +\end{verbatim} + + +\section{The \keyword{del} statement \label{del}} +\stindex{del} + +\begin{productionlist} + \production{del_stmt} + {"del" \token{target_list}} +\end{productionlist} + +Deletion is recursively defined very similar to the way assignment is +defined. Rather that spelling it out in full details, here are some +hints. +\indexii{deletion}{target} +\indexiii{deletion}{target}{list} + +Deletion of a target list recursively deletes each target, from left +to right. + +Deletion of a name removes the binding of that name +from the local or global namespace, depending on whether the name +occurs in a \keyword{global} statement in the same code block. If the +name is unbound, a \exception{NameError} exception will be raised. +\stindex{global} +\indexii{unbinding}{name} + +It is illegal to delete a name from the local namespace if it occurs +as a free variable\indexii{free}{variable} in a nested block. + +Deletion of attribute references, subscriptions and slicings +is passed to the primary object involved; deletion of a slicing +is in general equivalent to assignment of an empty slice of the +right type (but even this is determined by the sliced object). +\indexii{attribute}{deletion} + + +\section{The \keyword{print} statement \label{print}} +\stindex{print} + +\begin{productionlist} + \production{print_stmt} + {"print" ([\token{expression} ("," \token{expression})* [","]} + \productioncont{| ">>" \token{expression} + [("," \token{expression})+ [","])} +\end{productionlist} + +\keyword{print} evaluates each expression in turn and writes the +resulting object to standard output (see below). If an object is not +a string, it is first converted to a string using the rules for string +conversions. The (resulting or original) string is then written. A +space is written before each object is (converted and) written, unless +the output system believes it is positioned at the beginning of a +line. This is the case (1) when no characters have yet been written +to standard output, (2) when the last character written to standard +output is \character{\e n}, or (3) when the last write operation on +standard output was not a \keyword{print} statement. (In some cases +it may be functional to write an empty string to standard output for +this reason.) \note{Objects which act like file objects but which are +not the built-in file objects often do not properly emulate this +aspect of the file object's behavior, so it is best not to rely on +this.} +\index{output} +\indexii{writing}{values} + +A \character{\e n} character is written at the end, unless the +\keyword{print} statement ends with a comma. This is the only action +if the statement contains just the keyword \keyword{print}. +\indexii{trailing}{comma} +\indexii{newline}{suppression} + +Standard output is defined as the file object named \code{stdout} +in the built-in module \module{sys}. If no such object exists, or if +it does not have a \method{write()} method, a \exception{RuntimeError} +exception is raised. +\indexii{standard}{output} +\refbimodindex{sys} +\withsubitem{(in module sys)}{\ttindex{stdout}} +\exindex{RuntimeError} + +\keyword{print} also has an extended\index{extended print statement} +form, defined by the second portion of the syntax described above. +This form is sometimes referred to as ``\keyword{print} chevron.'' +In this form, the first expression after the \code{>>} must +evaluate to a ``file-like'' object, specifically an object that has a +\method{write()} method as described above. With this extended form, +the subsequent expressions are printed to this file object. If the +first expression evaluates to \code{None}, then \code{sys.stdout} is +used as the file for output. + + +\section{The \keyword{return} statement \label{return}} +\stindex{return} + +\begin{productionlist} + \production{return_stmt} + {"return" [\token{expression_list}]} +\end{productionlist} + +\keyword{return} may only occur syntactically nested in a function +definition, not within a nested class definition. +\indexii{function}{definition} +\indexii{class}{definition} + +If an expression list is present, it is evaluated, else \code{None} +is substituted. + +\keyword{return} leaves the current function call with the expression +list (or \code{None}) as return value. + +When \keyword{return} passes control out of a \keyword{try} statement +with a \keyword{finally} clause, that \keyword{finally} clause is executed +before really leaving the function. +\kwindex{finally} + +In a generator function, the \keyword{return} statement is not allowed +to include an \grammartoken{expression_list}. In that context, a bare +\keyword{return} indicates that the generator is done and will cause +\exception{StopIteration} to be raised. + + +\section{The \keyword{yield} statement \label{yield}} +\stindex{yield} + +\begin{productionlist} + \production{yield_stmt} + {\token{yield_expression}} +\end{productionlist} + +\index{generator!function} +\index{generator!iterator} +\index{function!generator} +\exindex{StopIteration} + +The \keyword{yield} statement is only used when defining a generator +function, and is only used in the body of the generator function. +Using a \keyword{yield} statement in a function definition is +sufficient to cause that definition to create a generator function +instead of a normal function. + +When a generator function is called, it returns an iterator known as a +generator iterator, or more commonly, a generator. The body of the +generator function is executed by calling the generator's +\method{next()} method repeatedly until it raises an exception. + +When a \keyword{yield} statement is executed, the state of the +generator is frozen and the value of \grammartoken{expression_list} is +returned to \method{next()}'s caller. By ``frozen'' we mean that all +local state is retained, including the current bindings of local +variables, the instruction pointer, and the internal evaluation stack: +enough information is saved so that the next time \method{next()} is +invoked, the function can proceed exactly as if the \keyword{yield} +statement were just another external call. + +As of Python version 2.5, the \keyword{yield} statement is now +allowed in the \keyword{try} clause of a \keyword{try} ...\ +\keyword{finally} construct. If the generator is not resumed before +it is finalized (by reaching a zero reference count or by being garbage +collected), the generator-iterator's \method{close()} method will be +called, allowing any pending \keyword{finally} clauses to execute. + +\begin{notice} +In Python 2.2, the \keyword{yield} statement is only allowed +when the \code{generators} feature has been enabled. It will always +be enabled in Python 2.3. This \code{__future__} import statement can +be used to enable the feature: + +\begin{verbatim} +from __future__ import generators +\end{verbatim} +\end{notice} + + +\begin{seealso} + \seepep{0255}{Simple Generators} + {The proposal for adding generators and the \keyword{yield} + statement to Python.} + + \seepep{0342}{Coroutines via Enhanced Generators} + {The proposal that, among other generator enhancements, + proposed allowing \keyword{yield} to appear inside a + \keyword{try} ... \keyword{finally} block.} +\end{seealso} + + +\section{The \keyword{raise} statement \label{raise}} +\stindex{raise} + +\begin{productionlist} + \production{raise_stmt} + {"raise" [\token{expression} ["," \token{expression} + ["," \token{expression}]]]} +\end{productionlist} + +If no expressions are present, \keyword{raise} re-raises the last +exception that was active in the current scope. If no exception is +active in the current scope, a \exception{TypeError} exception is +raised indicating that this is an error (if running under IDLE, a +\exception{Queue.Empty} exception is raised instead). +\index{exception} +\indexii{raising}{exception} + +Otherwise, \keyword{raise} evaluates the expressions to get three +objects, using \code{None} as the value of omitted expressions. The +first two objects are used to determine the \emph{type} and +\emph{value} of the exception. + +If the first object is an instance, the type of the exception is the +class of the instance, the instance itself is the value, and the +second object must be \code{None}. + +If the first object is a class, it becomes the type of the exception. +The second object is used to determine the exception value: If it is +an instance of the class, the instance becomes the exception value. +If the second object is a tuple, it is used as the argument list for +the class constructor; if it is \code{None}, an empty argument list is +used, and any other object is treated as a single argument to the +constructor. The instance so created by calling the constructor is +used as the exception value. + +If a third object is present and not \code{None}, it must be a +traceback\obindex{traceback} object (see section~\ref{traceback}), and +it is substituted instead of the current location as the place where +the exception occurred. If the third object is present and not a +traceback object or \code{None}, a \exception{TypeError} exception is +raised. The three-expression form of \keyword{raise} is useful to +re-raise an exception transparently in an except clause, but +\keyword{raise} with no expressions should be preferred if the +exception to be re-raised was the most recently active exception in +the current scope. + +Additional information on exceptions can be found in +section~\ref{exceptions}, and information about handling exceptions is +in section~\ref{try}. + + +\section{The \keyword{break} statement \label{break}} +\stindex{break} + +\begin{productionlist} + \production{break_stmt} + {"break"} +\end{productionlist} + +\keyword{break} may only occur syntactically nested in a \keyword{for} +or \keyword{while} loop, but not nested in a function or class definition +within that loop. +\stindex{for} +\stindex{while} +\indexii{loop}{statement} + +It terminates the nearest enclosing loop, skipping the optional +\keyword{else} clause if the loop has one. +\kwindex{else} + +If a \keyword{for} loop is terminated by \keyword{break}, the loop control +target keeps its current value. +\indexii{loop control}{target} + +When \keyword{break} passes control out of a \keyword{try} statement +with a \keyword{finally} clause, that \keyword{finally} clause is executed +before really leaving the loop. +\kwindex{finally} + + +\section{The \keyword{continue} statement \label{continue}} +\stindex{continue} + +\begin{productionlist} + \production{continue_stmt} + {"continue"} +\end{productionlist} + +\keyword{continue} may only occur syntactically nested in a \keyword{for} or +\keyword{while} loop, but not nested in a function or class definition or +\keyword{finally} statement within that loop.\footnote{It may +occur within an \keyword{except} or \keyword{else} clause. The +restriction on occurring in the \keyword{try} clause is implementor's +laziness and will eventually be lifted.} +It continues with the next cycle of the nearest enclosing loop. +\stindex{for} +\stindex{while} +\indexii{loop}{statement} +\kwindex{finally} + + +\section{The \keyword{import} statement \label{import}} +\stindex{import} +\index{module!importing} +\indexii{name}{binding} +\kwindex{from} + +\begin{productionlist} + \production{import_stmt} + {"import" \token{module} ["as" \token{name}] + ( "," \token{module} ["as" \token{name}] )*} + \productioncont{| "from" \token{relative_module} "import" \token{identifier} + ["as" \token{name}]} + \productioncont{ ( "," \token{identifier} ["as" \token{name}] )*} + \productioncont{| "from" \token{relative_module} "import" "(" + \token{identifier} ["as" \token{name}]} + \productioncont{ ( "," \token{identifier} ["as" \token{name}] )* [","] ")"} + \productioncont{| "from" \token{module} "import" "*"} + \production{module} + {(\token{identifier} ".")* \token{identifier}} + \production{relative_module} + {"."* \token{module} | "."+} + \production{name} + {\token{identifier}} +\end{productionlist} + +Import statements are executed in two steps: (1) find a module, and +initialize it if necessary; (2) define a name or names in the local +namespace (of the scope where the \keyword{import} statement occurs). +The first form (without \keyword{from}) repeats these steps for each +identifier in the list. The form with \keyword{from} performs step +(1) once, and then performs step (2) repeatedly. + +In this context, to ``initialize'' a built-in or extension module means to +call an initialization function that the module must provide for the purpose +(in the reference implementation, the function's name is obtained by +prepending string ``init'' to the module's name); to ``initialize'' a +Python-coded module means to execute the module's body. + +The system maintains a table of modules that have been or are being +initialized, +indexed by module name. This table is +accessible as \code{sys.modules}. When a module name is found in +this table, step (1) is finished. If not, a search for a module +definition is started. When a module is found, it is loaded. Details +of the module searching and loading process are implementation and +platform specific. It generally involves searching for a ``built-in'' +module with the given name and then searching a list of locations +given as \code{sys.path}. +\withsubitem{(in module sys)}{\ttindex{modules}} +\ttindex{sys.modules} +\indexii{module}{name} +\indexii{built-in}{module} +\indexii{user-defined}{module} +\refbimodindex{sys} +\indexii{filename}{extension} +\indexiii{module}{search}{path} + +If a built-in module is found,\indexii{module}{initialization} its +built-in initialization code is executed and step (1) is finished. If +no matching file is found, +\exception{ImportError}\exindex{ImportError} is raised. +\index{code block}If a file is found, it is parsed, +yielding an executable code block. If a syntax error occurs, +\exception{SyntaxError}\exindex{SyntaxError} is raised. Otherwise, an +empty module of the given name is created and inserted in the module +table, and then the code block is executed in the context of this +module. Exceptions during this execution terminate step (1). + +When step (1) finishes without raising an exception, step (2) can +begin. + +The first form of \keyword{import} statement binds the module name in +the local namespace to the module object, and then goes on to import +the next identifier, if any. If the module name is followed by +\keyword{as}, the name following \keyword{as} is used as the local +name for the module. + +The \keyword{from} form does not bind the module name: it goes through the +list of identifiers, looks each one of them up in the module found in step +(1), and binds the name in the local namespace to the object thus found. +As with the first form of \keyword{import}, an alternate local name can be +supplied by specifying "\keyword{as} localname". If a name is not found, +\exception{ImportError} is raised. If the list of identifiers is replaced +by a star (\character{*}), all public names defined in the module are +bound in the local namespace of the \keyword{import} statement.. +\indexii{name}{binding} +\exindex{ImportError} + +The \emph{public names} defined by a module are determined by checking +the module's namespace for a variable named \code{__all__}; if +defined, it must be a sequence of strings which are names defined or +imported by that module. The names given in \code{__all__} are all +considered public and are required to exist. If \code{__all__} is not +defined, the set of public names includes all names found in the +module's namespace which do not begin with an underscore character +(\character{_}). \code{__all__} should contain the entire public API. +It is intended to avoid accidentally exporting items that are not part +of the API (such as library modules which were imported and used within +the module). +\withsubitem{(optional module attribute)}{\ttindex{__all__}} + +The \keyword{from} form with \samp{*} may only occur in a module +scope. If the wild card form of import --- \samp{import *} --- is +used in a function and the function contains or is a nested block with +free variables, the compiler will raise a \exception{SyntaxError}. + +\kwindex{from} +\stindex{from} + +\strong{Hierarchical module names:}\indexiii{hierarchical}{module}{names} +when the module names contains one or more dots, the module search +path is carried out differently. The sequence of identifiers up to +the last dot is used to find a ``package''\index{packages}; the final +identifier is then searched inside the package. A package is +generally a subdirectory of a directory on \code{sys.path} that has a +file \file{__init__.py}.\ttindex{__init__.py} +% +[XXX Can't be bothered to spell this out right now; see the URL +\url{http://www.python.org/doc/essays/packages.html} for more details, also +about how the module search works from inside a package.] + +The built-in function \function{__import__()} is provided to support +applications that determine which modules need to be loaded +dynamically; refer to \ulink{Built-in +Functions}{../lib/built-in-funcs.html} in the +\citetitle[../lib/lib.html]{Python Library Reference} for additional +information. +\bifuncindex{__import__} + +\subsection{Future statements \label{future}} + +A \dfn{future statement}\indexii{future}{statement} is a directive to +the compiler that a particular module should be compiled using syntax +or semantics that will be available in a specified future release of +Python. The future statement is intended to ease migration to future +versions of Python that introduce incompatible changes to the +language. It allows use of the new features on a per-module basis +before the release in which the feature becomes standard. + +\begin{productionlist}[*] + \production{future_statement} + {"from" "__future__" "import" feature ["as" name]} + \productioncont{ ("," feature ["as" name])*} + \productioncont{| "from" "__future__" "import" "(" feature ["as" name]} + \productioncont{ ("," feature ["as" name])* [","] ")"} + \production{feature}{identifier} + \production{name}{identifier} +\end{productionlist} + +A future statement must appear near the top of the module. The only +lines that can appear before a future statement are: + +\begin{itemize} + +\item the module docstring (if any), +\item comments, +\item blank lines, and +\item other future statements. + +\end{itemize} + +The features recognized by Python 2.5 are \samp{absolute_import}, +\samp{division}, \samp{generators}, \samp{nested_scopes} and +\samp{with_statement}. \samp{generators} and \samp{nested_scopes} +are redundant in Python version 2.3 and above because they are always +enabled. + +A future statement is recognized and treated specially at compile +time: Changes to the semantics of core constructs are often +implemented by generating different code. It may even be the case +that a new feature introduces new incompatible syntax (such as a new +reserved word), in which case the compiler may need to parse the +module differently. Such decisions cannot be pushed off until +runtime. + +For any given release, the compiler knows which feature names have been +defined, and raises a compile-time error if a future statement contains +a feature not known to it. + +The direct runtime semantics are the same as for any import statement: +there is a standard module \module{__future__}, described later, and +it will be imported in the usual way at the time the future statement +is executed. + +The interesting runtime semantics depend on the specific feature +enabled by the future statement. + +Note that there is nothing special about the statement: + +\begin{verbatim} +import __future__ [as name] +\end{verbatim} + +That is not a future statement; it's an ordinary import statement with +no special semantics or syntax restrictions. + +Code compiled by an \keyword{exec} statement or calls to the builtin functions +\function{compile()} and \function{execfile()} that occur in a module +\module{M} containing a future statement will, by default, use the new +syntax or semantics associated with the future statement. This can, +starting with Python 2.2 be controlled by optional arguments to +\function{compile()} --- see the documentation of that function in the +\citetitle[../lib/built-in-funcs.html]{Python Library Reference} for +details. + +A future statement typed at an interactive interpreter prompt will +take effect for the rest of the interpreter session. If an +interpreter is started with the \programopt{-i} option, is passed a +script name to execute, and the script includes a future statement, it +will be in effect in the interactive session started after the script +is executed. + +\section{The \keyword{global} statement \label{global}} +\stindex{global} + +\begin{productionlist} + \production{global_stmt} + {"global" \token{identifier} ("," \token{identifier})*} +\end{productionlist} + +The \keyword{global} statement is a declaration which holds for the +entire current code block. It means that the listed identifiers are to be +interpreted as globals. It would be impossible to assign to a global +variable without \keyword{global}, although free variables may refer +to globals without being declared global. +\indexiii{global}{name}{binding} + +Names listed in a \keyword{global} statement must not be used in the same +code block textually preceding that \keyword{global} statement. + +Names listed in a \keyword{global} statement must not be defined as formal +parameters or in a \keyword{for} loop control target, \keyword{class} +definition, function definition, or \keyword{import} statement. + +(The current implementation does not enforce the latter two +restrictions, but programs should not abuse this freedom, as future +implementations may enforce them or silently change the meaning of the +program.) + +\strong{Programmer's note:} +the \keyword{global} is a directive to the parser. It +applies only to code parsed at the same time as the \keyword{global} +statement. In particular, a \keyword{global} statement contained in an +\keyword{exec} statement does not affect the code block \emph{containing} +the \keyword{exec} statement, and code contained in an \keyword{exec} +statement is unaffected by \keyword{global} statements in the code +containing the \keyword{exec} statement. The same applies to the +\function{eval()}, \function{execfile()} and \function{compile()} functions. +\stindex{exec} +\bifuncindex{eval} +\bifuncindex{execfile} +\bifuncindex{compile} + + +\section{The \keyword{exec} statement \label{exec}} +\stindex{exec} + +\begin{productionlist} + \production{exec_stmt} + {"exec" \token{or_expr} + ["in" \token{expression} ["," \token{expression}]]} +\end{productionlist} + +This statement supports dynamic execution of Python code. The first +expression should evaluate to either a string, an open file object, or +a code object. If it is a string, the string is parsed as a suite of +Python statements which is then executed (unless a syntax error +occurs). If it is an open file, the file is parsed until \EOF{} and +executed. If it is a code object, it is simply executed. In all +cases, the code that's executed is expected to be valid as file +input (see section~\ref{file-input}, ``File input''). Be aware that +the \keyword{return} and \keyword{yield} statements may not be used +outside of function definitions even within the context of code passed +to the \keyword{exec} statement. + +In all cases, if the optional parts are omitted, the code is executed +in the current scope. If only the first expression after \keyword{in} +is specified, it should be a dictionary, which will be used for both +the global and the local variables. If two expressions are given, +they are used for the global and local variables, respectively. +If provided, \var{locals} can be any mapping object. +\versionchanged[formerly \var{locals} was required to be a dictionary]{2.4} + +As a side effect, an implementation may insert additional keys into +the dictionaries given besides those corresponding to variable names +set by the executed code. For example, the current implementation +may add a reference to the dictionary of the built-in module +\module{__builtin__} under the key \code{__builtins__} (!). +\ttindex{__builtins__} +\refbimodindex{__builtin__} + +\strong{Programmer's hints:} +dynamic evaluation of expressions is supported by the built-in +function \function{eval()}. The built-in functions +\function{globals()} and \function{locals()} return the current global +and local dictionary, respectively, which may be useful to pass around +for use by \keyword{exec}. +\bifuncindex{eval} +\bifuncindex{globals} +\bifuncindex{locals} + + + + + diff --git a/sys/src/cmd/python/Doc/ref/ref7.tex b/sys/src/cmd/python/Doc/ref/ref7.tex new file mode 100644 index 000000000..c9e07fb7e --- /dev/null +++ b/sys/src/cmd/python/Doc/ref/ref7.tex @@ -0,0 +1,544 @@ +\chapter{Compound statements\label{compound}} +\indexii{compound}{statement} + +Compound statements contain (groups of) other statements; they affect +or control the execution of those other statements in some way. In +general, compound statements span multiple lines, although in simple +incarnations a whole compound statement may be contained in one line. + +The \keyword{if}, \keyword{while} and \keyword{for} statements implement +traditional control flow constructs. \keyword{try} specifies exception +handlers and/or cleanup code for a group of statements. Function and +class definitions are also syntactically compound statements. + +Compound statements consist of one or more `clauses.' A clause +consists of a header and a `suite.' The clause headers of a +particular compound statement are all at the same indentation level. +Each clause header begins with a uniquely identifying keyword and ends +with a colon. A suite is a group of statements controlled by a +clause. A suite can be one or more semicolon-separated simple +statements on the same line as the header, following the header's +colon, or it can be one or more indented statements on subsequent +lines. Only the latter form of suite can contain nested compound +statements; the following is illegal, mostly because it wouldn't be +clear to which \keyword{if} clause a following \keyword{else} clause would +belong: +\index{clause} +\index{suite} + +\begin{verbatim} +if test1: if test2: print x +\end{verbatim} + +Also note that the semicolon binds tighter than the colon in this +context, so that in the following example, either all or none of the +\keyword{print} statements are executed: + +\begin{verbatim} +if x < y < z: print x; print y; print z +\end{verbatim} + +Summarizing: + +\begin{productionlist} + \production{compound_stmt} + {\token{if_stmt}} + \productioncont{| \token{while_stmt}} + \productioncont{| \token{for_stmt}} + \productioncont{| \token{try_stmt}} + \productioncont{| \token{with_stmt}} + \productioncont{| \token{funcdef}} + \productioncont{| \token{classdef}} + \production{suite} + {\token{stmt_list} NEWLINE + | NEWLINE INDENT \token{statement}+ DEDENT} + \production{statement} + {\token{stmt_list} NEWLINE | \token{compound_stmt}} + \production{stmt_list} + {\token{simple_stmt} (";" \token{simple_stmt})* [";"]} +\end{productionlist} + +Note that statements always end in a +\code{NEWLINE}\index{NEWLINE token} possibly followed by a +\code{DEDENT}.\index{DEDENT token} Also note that optional +continuation clauses always begin with a keyword that cannot start a +statement, thus there are no ambiguities (the `dangling +\keyword{else}' problem is solved in Python by requiring nested +\keyword{if} statements to be indented). +\indexii{dangling}{else} + +The formatting of the grammar rules in the following sections places +each clause on a separate line for clarity. + + +\section{The \keyword{if} statement\label{if}} +\stindex{if} + +The \keyword{if} statement is used for conditional execution: + +\begin{productionlist} + \production{if_stmt} + {"if" \token{expression} ":" \token{suite}} + \productioncont{( "elif" \token{expression} ":" \token{suite} )*} + \productioncont{["else" ":" \token{suite}]} +\end{productionlist} + +It selects exactly one of the suites by evaluating the expressions one +by one until one is found to be true (see section~\ref{Booleans} for +the definition of true and false); then that suite is executed (and no +other part of the \keyword{if} statement is executed or evaluated). If +all expressions are false, the suite of the \keyword{else} clause, if +present, is executed. +\kwindex{elif} +\kwindex{else} + + +\section{The \keyword{while} statement\label{while}} +\stindex{while} +\indexii{loop}{statement} + +The \keyword{while} statement is used for repeated execution as long +as an expression is true: + +\begin{productionlist} + \production{while_stmt} + {"while" \token{expression} ":" \token{suite}} + \productioncont{["else" ":" \token{suite}]} +\end{productionlist} + +This repeatedly tests the expression and, if it is true, executes the +first suite; if the expression is false (which may be the first time it +is tested) the suite of the \keyword{else} clause, if present, is +executed and the loop terminates. +\kwindex{else} + +A \keyword{break} statement executed in the first suite terminates the +loop without executing the \keyword{else} clause's suite. A +\keyword{continue} statement executed in the first suite skips the rest +of the suite and goes back to testing the expression. +\stindex{break} +\stindex{continue} + + +\section{The \keyword{for} statement\label{for}} +\stindex{for} +\indexii{loop}{statement} + +The \keyword{for} statement is used to iterate over the elements of a +sequence (such as a string, tuple or list) or other iterable object: +\obindex{sequence} + +\begin{productionlist} + \production{for_stmt} + {"for" \token{target_list} "in" \token{expression_list} + ":" \token{suite}} + \productioncont{["else" ":" \token{suite}]} +\end{productionlist} + +The expression list is evaluated once; it should yield an iterable +object. An iterator is created for the result of the +{}\code{expression_list}. The suite is then executed once for each +item provided by the iterator, in the +order of ascending indices. Each item in turn is assigned to the +target list using the standard rules for assignments, and then the +suite is executed. When the items are exhausted (which is immediately +when the sequence is empty), the suite in the \keyword{else} clause, if +present, is executed, and the loop terminates. +\kwindex{in} +\kwindex{else} +\indexii{target}{list} + +A \keyword{break} statement executed in the first suite terminates the +loop without executing the \keyword{else} clause's suite. A +\keyword{continue} statement executed in the first suite skips the rest +of the suite and continues with the next item, or with the \keyword{else} +clause if there was no next item. +\stindex{break} +\stindex{continue} + +The suite may assign to the variable(s) in the target list; this does +not affect the next item assigned to it. + +The target list is not deleted when the loop is finished, but if the +sequence is empty, it will not have been assigned to at all by the +loop. Hint: the built-in function \function{range()} returns a +sequence of integers suitable to emulate the effect of Pascal's +\code{for i := a to b do}; +e.g., \code{range(3)} returns the list \code{[0, 1, 2]}. +\bifuncindex{range} +\indexii{Pascal}{language} + +\warning{There is a subtlety when the sequence is being modified +by the loop (this can only occur for mutable sequences, i.e. lists). +An internal counter is used to keep track of which item is used next, +and this is incremented on each iteration. When this counter has +reached the length of the sequence the loop terminates. This means that +if the suite deletes the current (or a previous) item from the +sequence, the next item will be skipped (since it gets the index of +the current item which has already been treated). Likewise, if the +suite inserts an item in the sequence before the current item, the +current item will be treated again the next time through the loop. +This can lead to nasty bugs that can be avoided by making a temporary +copy using a slice of the whole sequence, e.g., +\index{loop!over mutable sequence} +\index{mutable sequence!loop over}} + +\begin{verbatim} +for x in a[:]: + if x < 0: a.remove(x) +\end{verbatim} + + +\section{The \keyword{try} statement\label{try}} +\stindex{try} + +The \keyword{try} statement specifies exception handlers and/or cleanup +code for a group of statements: + +\begin{productionlist} + \production{try_stmt} {try1_stmt | try2_stmt} + \production{try1_stmt} + {"try" ":" \token{suite}} + \productioncont{("except" [\token{expression} + ["," \token{target}]] ":" \token{suite})+} + \productioncont{["else" ":" \token{suite}]} + \productioncont{["finally" ":" \token{suite}]} + \production{try2_stmt} + {"try" ":" \token{suite}} + \productioncont{"finally" ":" \token{suite}} +\end{productionlist} + +\versionchanged[In previous versions of Python, +\keyword{try}...\keyword{except}...\keyword{finally} did not work. +\keyword{try}...\keyword{except} had to be nested in +\keyword{try}...\keyword{finally}]{2.5} + +The \keyword{except} clause(s) specify one or more exception handlers. +When no exception occurs in the +\keyword{try} clause, no exception handler is executed. When an +exception occurs in the \keyword{try} suite, a search for an exception +handler is started. This search inspects the except clauses in turn until +one is found that matches the exception. An expression-less except +clause, if present, must be last; it matches any exception. For an +except clause with an expression, that expression is evaluated, and the +clause matches the exception if the resulting object is ``compatible'' +with the exception. An object is compatible with an exception if it +is the class or a base class of the exception object, a tuple +containing an item compatible with the exception, or, in the +(deprecated) case of string exceptions, is the raised string itself +(note that the object identities must match, i.e. it must be the same +string object, not just a string with the same value). +\kwindex{except} + +If no except clause matches the exception, the search for an exception +handler continues in the surrounding code and on the invocation stack. +\footnote{The exception is propogated to the invocation stack only if +there is no \keyword{finally} clause that negates the exception.} + +If the evaluation of an expression in the header of an except clause +raises an exception, the original search for a handler is canceled +and a search starts for the new exception in the surrounding code and +on the call stack (it is treated as if the entire \keyword{try} statement +raised the exception). + +When a matching except clause is found, the exception is assigned to +the target specified in that except clause, if present, and the except +clause's suite is executed. All except clauses must have an +executable block. When the end of this block is reached, execution +continues normally after the entire try statement. (This means that +if two nested handlers exist for the same exception, and the exception +occurs in the try clause of the inner handler, the outer handler will +not handle the exception.) + +Before an except clause's suite is executed, details about the +exception are assigned to three variables in the +\module{sys}\refbimodindex{sys} module: \code{sys.exc_type} receives +the object identifying the exception; \code{sys.exc_value} receives +the exception's parameter; \code{sys.exc_traceback} receives a +traceback object\obindex{traceback} (see section~\ref{traceback}) +identifying the point in the program where the exception occurred. +These details are also available through the \function{sys.exc_info()} +function, which returns a tuple \code{(\var{exc_type}, \var{exc_value}, +\var{exc_traceback})}. Use of the corresponding variables is +deprecated in favor of this function, since their use is unsafe in a +threaded program. As of Python 1.5, the variables are restored to +their previous values (before the call) when returning from a function +that handled an exception. +\withsubitem{(in module sys)}{\ttindex{exc_type} + \ttindex{exc_value}\ttindex{exc_traceback}} + +The optional \keyword{else} clause is executed if and when control +flows off the end of the \keyword{try} clause.\footnote{ + Currently, control ``flows off the end'' except in the case of an + exception or the execution of a \keyword{return}, + \keyword{continue}, or \keyword{break} statement. +} Exceptions in the \keyword{else} clause are not handled by the +preceding \keyword{except} clauses. +\kwindex{else} +\stindex{return} +\stindex{break} +\stindex{continue} + +If \keyword{finally} is present, it specifies a `cleanup' handler. The +\keyword{try} clause is executed, including any \keyword{except} and +\keyword{else} clauses. If an exception occurs in any of the clauses +and is not handled, the exception is temporarily saved. The +\keyword{finally} clause is executed. If there is a saved exception, +it is re-raised at the end of the \keyword{finally} clause. +If the \keyword{finally} clause raises another exception or +executes a \keyword{return} or \keyword{break} statement, the saved +exception is lost. The exception information is not available to the +program during execution of the \keyword{finally} clause. +\kwindex{finally} + +When a \keyword{return}, \keyword{break} or \keyword{continue} statement is +executed in the \keyword{try} suite of a \keyword{try}...\keyword{finally} +statement, the \keyword{finally} clause is also executed `on the way out.' A +\keyword{continue} statement is illegal in the \keyword{finally} clause. +(The reason is a problem with the current implementation --- this +restriction may be lifted in the future). +\stindex{return} +\stindex{break} +\stindex{continue} + +Additional information on exceptions can be found in +section~\ref{exceptions}, and information on using the \keyword{raise} +statement to generate exceptions may be found in section~\ref{raise}. + + +\section{The \keyword{with} statement\label{with}} +\stindex{with} + +\versionadded{2.5} + +The \keyword{with} statement is used to wrap the execution of a block +with methods defined by a context manager (see +section~\ref{context-managers}). This allows common +\keyword{try}...\keyword{except}...\keyword{finally} usage patterns to +be encapsulated for convenient reuse. + +\begin{productionlist} + \production{with_stmt} + {"with" \token{expression} ["as" \token{target}] ":" \token{suite}} +\end{productionlist} + +The execution of the \keyword{with} statement proceeds as follows: + +\begin{enumerate} + +\item The context expression is evaluated to obtain a context manager. + +\item The context manager's \method{__enter__()} method is invoked. + +\item If a target was included in the \keyword{with} +statement, the return value from \method{__enter__()} is assigned to it. + +\note{The \keyword{with} statement guarantees that if the +\method{__enter__()} method returns without an error, then +\method{__exit__()} will always be called. Thus, if an error occurs +during the assignment to the target list, it will be treated the same as +an error occurring within the suite would be. See step 5 below.} + +\item The suite is executed. + +\item The context manager's \method{__exit__()} method is invoked. If +an exception caused the suite to be exited, its type, value, and +traceback are passed as arguments to \method{__exit__()}. Otherwise, +three \constant{None} arguments are supplied. + +If the suite was exited due to an exception, and the return +value from the \method{__exit__()} method was false, the exception is +reraised. If the return value was true, the exception is suppressed, and +execution continues with the statement following the \keyword{with} +statement. + +If the suite was exited for any reason other than an exception, the +return value from \method{__exit__()} is ignored, and execution proceeds +at the normal location for the kind of exit that was taken. + +\end{enumerate} + +\begin{notice} +In Python 2.5, the \keyword{with} statement is only allowed +when the \code{with_statement} feature has been enabled. It will always +be enabled in Python 2.6. This \code{__future__} import statement can +be used to enable the feature: + +\begin{verbatim} +from __future__ import with_statement +\end{verbatim} +\end{notice} + +\begin{seealso} + \seepep{0343}{The "with" statement} + {The specification, background, and examples for the + Python \keyword{with} statement.} +\end{seealso} + +\section{Function definitions\label{function}} +\indexii{function}{definition} +\stindex{def} + +A function definition defines a user-defined function object (see +section~\ref{types}): +\obindex{user-defined function} +\obindex{function} + +\begin{productionlist} + \production{funcdef} + {[\token{decorators}] "def" \token{funcname} "(" [\token{parameter_list}] ")" + ":" \token{suite}} + \production{decorators} + {\token{decorator}+} + \production{decorator} + {"@" \token{dotted_name} ["(" [\token{argument_list} [","]] ")"] NEWLINE} + \production{dotted_name} + {\token{identifier} ("." \token{identifier})*} + \production{parameter_list} + {(\token{defparameter} ",")*} + \productioncont{(~~"*" \token{identifier} [, "**" \token{identifier}]} + \productioncont{ | "**" \token{identifier}} + \productioncont{ | \token{defparameter} [","] )} + \production{defparameter} + {\token{parameter} ["=" \token{expression}]} + \production{sublist} + {\token{parameter} ("," \token{parameter})* [","]} + \production{parameter} + {\token{identifier} | "(" \token{sublist} ")"} + \production{funcname} + {\token{identifier}} +\end{productionlist} + +A function definition is an executable statement. Its execution binds +the function name in the current local namespace to a function object +(a wrapper around the executable code for the function). This +function object contains a reference to the current global namespace +as the global namespace to be used when the function is called. +\indexii{function}{name} +\indexii{name}{binding} + +The function definition does not execute the function body; this gets +executed only when the function is called. + +A function definition may be wrapped by one or more decorator expressions. +Decorator expressions are evaluated when the function is defined, in the scope +that contains the function definition. The result must be a callable, +which is invoked with the function object as the only argument. +The returned value is bound to the function name instead of the function +object. Multiple decorators are applied in nested fashion. +For example, the following code: + +\begin{verbatim} +@f1(arg) +@f2 +def func(): pass +\end{verbatim} + +is equivalent to: + +\begin{verbatim} +def func(): pass +func = f1(arg)(f2(func)) +\end{verbatim} + +When one or more top-level parameters have the form \var{parameter} +\code{=} \var{expression}, the function is said to have ``default +parameter values.'' For a parameter with a +default value, the corresponding argument may be omitted from a call, +in which case the parameter's default value is substituted. If a +parameter has a default value, all following parameters must also have +a default value --- this is a syntactic restriction that is not +expressed by the grammar. +\indexiii{default}{parameter}{value} + +\strong{Default parameter values are evaluated when the function +definition is executed.} This means that the expression is evaluated +once, when the function is defined, and that that same +``pre-computed'' value is used for each call. This is especially +important to understand when a default parameter is a mutable object, +such as a list or a dictionary: if the function modifies the object +(e.g. by appending an item to a list), the default value is in effect +modified. This is generally not what was intended. A way around this +is to use \code{None} as the default, and explicitly test for it in +the body of the function, e.g.: + +\begin{verbatim} +def whats_on_the_telly(penguin=None): + if penguin is None: + penguin = [] + penguin.append("property of the zoo") + return penguin +\end{verbatim} + +Function call semantics are described in more detail in +section~\ref{calls}. +A function call always assigns values to all parameters mentioned in +the parameter list, either from position arguments, from keyword +arguments, or from default values. If the form ``\code{*identifier}'' +is present, it is initialized to a tuple receiving any excess +positional parameters, defaulting to the empty tuple. If the form +``\code{**identifier}'' is present, it is initialized to a new +dictionary receiving any excess keyword arguments, defaulting to a +new empty dictionary. + +It is also possible to create anonymous functions (functions not bound +to a name), for immediate use in expressions. This uses lambda forms, +described in section~\ref{lambda}. Note that the lambda form is +merely a shorthand for a simplified function definition; a function +defined in a ``\keyword{def}'' statement can be passed around or +assigned to another name just like a function defined by a lambda +form. The ``\keyword{def}'' form is actually more powerful since it +allows the execution of multiple statements. +\indexii{lambda}{form} + +\strong{Programmer's note:} Functions are first-class objects. A +``\code{def}'' form executed inside a function definition defines a +local function that can be returned or passed around. Free variables +used in the nested function can access the local variables of the +function containing the def. See section~\ref{naming} for details. + + +\section{Class definitions\label{class}} +\indexii{class}{definition} +\stindex{class} + +A class definition defines a class object (see section~\ref{types}): +\obindex{class} + +\begin{productionlist} + \production{classdef} + {"class" \token{classname} [\token{inheritance}] ":" + \token{suite}} + \production{inheritance} + {"(" [\token{expression_list}] ")"} + \production{classname} + {\token{identifier}} +\end{productionlist} + +A class definition is an executable statement. It first evaluates the +inheritance list, if present. Each item in the inheritance list +should evaluate to a class object or class type which allows +subclassing. The class's suite is then executed +in a new execution frame (see section~\ref{naming}), using a newly +created local namespace and the original global namespace. +(Usually, the suite contains only function definitions.) When the +class's suite finishes execution, its execution frame is discarded but +its local namespace is saved. A class object is then created using +the inheritance list for the base classes and the saved local +namespace for the attribute dictionary. The class name is bound to this +class object in the original local namespace. +\index{inheritance} +\indexii{class}{name} +\indexii{name}{binding} +\indexii{execution}{frame} + +\strong{Programmer's note:} Variables defined in the class definition +are class variables; they are shared by all instances. To define +instance variables, they must be given a value in the +\method{__init__()} method or in another method. Both class and +instance variables are accessible through the notation +``\code{self.name}'', and an instance variable hides a class variable +with the same name when accessed in this way. Class variables with +immutable values can be used as defaults for instance variables. +For new-style classes, descriptors can be used to create instance +variables with different implementation details. diff --git a/sys/src/cmd/python/Doc/ref/ref8.tex b/sys/src/cmd/python/Doc/ref/ref8.tex new file mode 100644 index 000000000..b77789f71 --- /dev/null +++ b/sys/src/cmd/python/Doc/ref/ref8.tex @@ -0,0 +1,112 @@ +\chapter{Top-level components\label{top-level}} + +The Python interpreter can get its input from a number of sources: +from a script passed to it as standard input or as program argument, +typed in interactively, from a module source file, etc. This chapter +gives the syntax used in these cases. +\index{interpreter} + + +\section{Complete Python programs\label{programs}} +\index{program} + +While a language specification need not prescribe how the language +interpreter is invoked, it is useful to have a notion of a complete +Python program. A complete Python program is executed in a minimally +initialized environment: all built-in and standard modules are +available, but none have been initialized, except for \module{sys} +(various system services), \module{__builtin__} (built-in functions, +exceptions and \code{None}) and \module{__main__}. The latter is used +to provide the local and global namespace for execution of the +complete program. +\refbimodindex{sys} +\refbimodindex{__main__} +\refbimodindex{__builtin__} + +The syntax for a complete Python program is that for file input, +described in the next section. + +The interpreter may also be invoked in interactive mode; in this case, +it does not read and execute a complete program but reads and executes +one statement (possibly compound) at a time. The initial environment +is identical to that of a complete program; each statement is executed +in the namespace of \module{__main__}. +\index{interactive mode} +\refbimodindex{__main__} + +Under \UNIX, a complete program can be passed to the interpreter in +three forms: with the \programopt{-c} \var{string} command line option, as a +file passed as the first command line argument, or as standard input. +If the file or standard input is a tty device, the interpreter enters +interactive mode; otherwise, it executes the file as a complete +program. +\index{UNIX} +\index{command line} +\index{standard input} + + +\section{File input\label{file-input}} + +All input read from non-interactive files has the same form: + +\begin{productionlist} + \production{file_input} + {(NEWLINE | \token{statement})*} +\end{productionlist} + +This syntax is used in the following situations: + +\begin{itemize} + +\item when parsing a complete Python program (from a file or from a string); + +\item when parsing a module; + +\item when parsing a string passed to the \keyword{exec} statement; + +\end{itemize} + + +\section{Interactive input\label{interactive}} + +Input in interactive mode is parsed using the following grammar: + +\begin{productionlist} + \production{interactive_input} + {[\token{stmt_list}] NEWLINE | \token{compound_stmt} NEWLINE} +\end{productionlist} + +Note that a (top-level) compound statement must be followed by a blank +line in interactive mode; this is needed to help the parser detect the +end of the input. + + +\section{Expression input\label{expression-input}} +\index{input} + +There are two forms of expression input. Both ignore leading +whitespace. +The string argument to \function{eval()} must have the following form: +\bifuncindex{eval} + +\begin{productionlist} + \production{eval_input} + {\token{expression_list} NEWLINE*} +\end{productionlist} + +The input line read by \function{input()} must have the following form: +\bifuncindex{input} + +\begin{productionlist} + \production{input_input} + {\token{expression_list} NEWLINE} +\end{productionlist} + +Note: to read `raw' input line without interpretation, you can use the +built-in function \function{raw_input()} or the \method{readline()} method +of file objects. +\obindex{file} +\index{input!raw} +\index{raw input} +\bifuncindex{raw_input} +\withsubitem{(file method)}{\ttindex{readline()}} diff --git a/sys/src/cmd/python/Doc/ref/reswords.py b/sys/src/cmd/python/Doc/ref/reswords.py new file mode 100644 index 000000000..68862bbcf --- /dev/null +++ b/sys/src/cmd/python/Doc/ref/reswords.py @@ -0,0 +1,23 @@ +"""Spit out the Python reserved words table.""" + +import keyword + +ncols = 5 + +def main(): + words = keyword.kwlist[:] + words.sort() + colwidth = 1 + max(map(len, words)) + nwords = len(words) + nrows = (nwords + ncols - 1) / ncols + for irow in range(nrows): + for icol in range(ncols): + i = irow + icol * nrows + if 0 <= i < nwords: + word = words[i] + else: + word = "" + print "%-*s" % (colwidth, word), + print + +main() |