diff options
author | cinap_lenrek <cinap_lenrek@localhost> | 2011-05-03 11:25:13 +0000 |
---|---|---|
committer | cinap_lenrek <cinap_lenrek@localhost> | 2011-05-03 11:25:13 +0000 |
commit | 458120dd40db6b4df55a4e96b650e16798ef06a0 (patch) | |
tree | 8f82685be24fef97e715c6f5ca4c68d34d5074ee /sys/src/cmd/python/Doc/ref/ref2.tex | |
parent | 3a742c699f6806c1145aea5149bf15de15a0afd7 (diff) |
add hg and python
Diffstat (limited to 'sys/src/cmd/python/Doc/ref/ref2.tex')
-rw-r--r-- | sys/src/cmd/python/Doc/ref/ref2.tex | 731 |
1 files changed, 731 insertions, 0 deletions
diff --git a/sys/src/cmd/python/Doc/ref/ref2.tex b/sys/src/cmd/python/Doc/ref/ref2.tex new file mode 100644 index 000000000..bad4609fb --- /dev/null +++ b/sys/src/cmd/python/Doc/ref/ref2.tex @@ -0,0 +1,731 @@ +\chapter{Lexical analysis\label{lexical}} + +A Python program is read by a \emph{parser}. Input to the parser is a +stream of \emph{tokens}, generated by the \emph{lexical analyzer}. This +chapter describes how the lexical analyzer breaks a file into tokens. +\index{lexical analysis} +\index{parser} +\index{token} + +Python uses the 7-bit \ASCII{} character set for program text. +\versionadded[An encoding declaration can be used to indicate that +string literals and comments use an encoding different from ASCII]{2.3} +For compatibility with older versions, Python only warns if it finds +8-bit characters; those warnings should be corrected by either declaring +an explicit encoding, or using escape sequences if those bytes are binary +data, instead of characters. + + +The run-time character set depends on the I/O devices connected to the +program but is generally a superset of \ASCII. + +\strong{Future compatibility note:} It may be tempting to assume that the +character set for 8-bit characters is ISO Latin-1 (an \ASCII{} +superset that covers most western languages that use the Latin +alphabet), but it is possible that in the future Unicode text editors +will become common. These generally use the UTF-8 encoding, which is +also an \ASCII{} superset, but with very different use for the +characters with ordinals 128-255. While there is no consensus on this +subject yet, it is unwise to assume either Latin-1 or UTF-8, even +though the current implementation appears to favor Latin-1. This +applies both to the source character set and the run-time character +set. + + +\section{Line structure\label{line-structure}} + +A Python program is divided into a number of \emph{logical lines}. +\index{line structure} + + +\subsection{Logical lines\label{logical}} + +The end of +a logical line is represented by the token NEWLINE. Statements cannot +cross logical line boundaries except where NEWLINE is allowed by the +syntax (e.g., between statements in compound statements). +A logical line is constructed from one or more \emph{physical lines} +by following the explicit or implicit \emph{line joining} rules. +\index{logical line} +\index{physical line} +\index{line joining} +\index{NEWLINE token} + + +\subsection{Physical lines\label{physical}} + +A physical line is a sequence of characters terminated by an end-of-line +sequence. In source files, any of the standard platform line +termination sequences can be used - the \UNIX{} form using \ASCII{} LF +(linefeed), the Windows form using the \ASCII{} sequence CR LF (return +followed by linefeed), or the Macintosh form using the \ASCII{} CR +(return) character. All of these forms can be used equally, regardless +of platform. + +When embedding Python, source code strings should be passed to Python +APIs using the standard C conventions for newline characters (the +\code{\e n} character, representing \ASCII{} LF, is the line +terminator). + + +\subsection{Comments\label{comments}} + +A comment starts with a hash character (\code{\#}) that is not part of +a string literal, and ends at the end of the physical line. A comment +signifies the end of the logical line unless the implicit line joining +rules are invoked. +Comments are ignored by the syntax; they are not tokens. +\index{comment} +\index{hash character} + + +\subsection{Encoding declarations\label{encodings}} +\index{source character set} +\index{encodings} + +If a comment in the first or second line of the Python script matches +the regular expression \regexp{coding[=:]\e s*([-\e w.]+)}, this comment is +processed as an encoding declaration; the first group of this +expression names the encoding of the source code file. The recommended +forms of this expression are + +\begin{verbatim} +# -*- coding: <encoding-name> -*- +\end{verbatim} + +which is recognized also by GNU Emacs, and + +\begin{verbatim} +# vim:fileencoding=<encoding-name> +\end{verbatim} + +which is recognized by Bram Moolenaar's VIM. In addition, if the first +bytes of the file are the UTF-8 byte-order mark +(\code{'\e xef\e xbb\e xbf'}), the declared file encoding is UTF-8 +(this is supported, among others, by Microsoft's \program{notepad}). + +If an encoding is declared, the encoding name must be recognized by +Python. % XXX there should be a list of supported encodings. +The encoding is used for all lexical analysis, in particular to find +the end of a string, and to interpret the contents of Unicode literals. +String literals are converted to Unicode for syntactical analysis, +then converted back to their original encoding before interpretation +starts. The encoding declaration must appear on a line of its own. + +\subsection{Explicit line joining\label{explicit-joining}} + +Two or more physical lines may be joined into logical lines using +backslash characters (\code{\e}), as follows: when a physical line ends +in a backslash that is not part of a string literal or comment, it is +joined with the following forming a single logical line, deleting the +backslash and the following end-of-line character. For example: +\index{physical line} +\index{line joining} +\index{line continuation} +\index{backslash character} +% +\begin{verbatim} +if 1900 < year < 2100 and 1 <= month <= 12 \ + and 1 <= day <= 31 and 0 <= hour < 24 \ + and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date + return 1 +\end{verbatim} + +A line ending in a backslash cannot carry a comment. A backslash does +not continue a comment. A backslash does not continue a token except +for string literals (i.e., tokens other than string literals cannot be +split across physical lines using a backslash). A backslash is +illegal elsewhere on a line outside a string literal. + + +\subsection{Implicit line joining\label{implicit-joining}} + +Expressions in parentheses, square brackets or curly braces can be +split over more than one physical line without using backslashes. +For example: + +\begin{verbatim} +month_names = ['Januari', 'Februari', 'Maart', # These are the + 'April', 'Mei', 'Juni', # Dutch names + 'Juli', 'Augustus', 'September', # for the months + 'Oktober', 'November', 'December'] # of the year +\end{verbatim} + +Implicitly continued lines can carry comments. The indentation of the +continuation lines is not important. Blank continuation lines are +allowed. There is no NEWLINE token between implicit continuation +lines. Implicitly continued lines can also occur within triple-quoted +strings (see below); in that case they cannot carry comments. + + +\subsection{Blank lines \label{blank-lines}} + +\index{blank line} +A logical line that contains only spaces, tabs, formfeeds and possibly +a comment, is ignored (i.e., no NEWLINE token is generated). During +interactive input of statements, handling of a blank line may differ +depending on the implementation of the read-eval-print loop. In the +standard implementation, an entirely blank logical line (i.e.\ one +containing not even whitespace or a comment) terminates a multi-line +statement. + + +\subsection{Indentation\label{indentation}} + +Leading whitespace (spaces and tabs) at the beginning of a logical +line is used to compute the indentation level of the line, which in +turn is used to determine the grouping of statements. +\index{indentation} +\index{whitespace} +\index{leading whitespace} +\index{space} +\index{tab} +\index{grouping} +\index{statement grouping} + +First, tabs are replaced (from left to right) by one to eight spaces +such that the total number of characters up to and including the +replacement is a multiple of +eight (this is intended to be the same rule as used by \UNIX). The +total number of spaces preceding the first non-blank character then +determines the line's indentation. Indentation cannot be split over +multiple physical lines using backslashes; the whitespace up to the +first backslash determines the indentation. + +\strong{Cross-platform compatibility note:} because of the nature of +text editors on non-UNIX platforms, it is unwise to use a mixture of +spaces and tabs for the indentation in a single source file. It +should also be noted that different platforms may explicitly limit the +maximum indentation level. + +A formfeed character may be present at the start of the line; it will +be ignored for the indentation calculations above. Formfeed +characters occurring elsewhere in the leading whitespace have an +undefined effect (for instance, they may reset the space count to +zero). + +The indentation levels of consecutive lines are used to generate +INDENT and DEDENT tokens, using a stack, as follows. +\index{INDENT token} +\index{DEDENT token} + +Before the first line of the file is read, a single zero is pushed on +the stack; this will never be popped off again. The numbers pushed on +the stack will always be strictly increasing from bottom to top. At +the beginning of each logical line, the line's indentation level is +compared to the top of the stack. If it is equal, nothing happens. +If it is larger, it is pushed on the stack, and one INDENT token is +generated. If it is smaller, it \emph{must} be one of the numbers +occurring on the stack; all numbers on the stack that are larger are +popped off, and for each number popped off a DEDENT token is +generated. At the end of the file, a DEDENT token is generated for +each number remaining on the stack that is larger than zero. + +Here is an example of a correctly (though confusingly) indented piece +of Python code: + +\begin{verbatim} +def perm(l): + # Compute the list of all permutations of l + if len(l) <= 1: + return [l] + r = [] + for i in range(len(l)): + s = l[:i] + l[i+1:] + p = perm(s) + for x in p: + r.append(l[i:i+1] + x) + return r +\end{verbatim} + +The following example shows various indentation errors: + +\begin{verbatim} + def perm(l): # error: first line indented +for i in range(len(l)): # error: not indented + s = l[:i] + l[i+1:] + p = perm(l[:i] + l[i+1:]) # error: unexpected indent + for x in p: + r.append(l[i:i+1] + x) + return r # error: inconsistent dedent +\end{verbatim} + +(Actually, the first three errors are detected by the parser; only the +last error is found by the lexical analyzer --- the indentation of +\code{return r} does not match a level popped off the stack.) + + +\subsection{Whitespace between tokens\label{whitespace}} + +Except at the beginning of a logical line or in string literals, the +whitespace characters space, tab and formfeed can be used +interchangeably to separate tokens. Whitespace is needed between two +tokens only if their concatenation could otherwise be interpreted as a +different token (e.g., ab is one token, but a b is two tokens). + + +\section{Other tokens\label{other-tokens}} + +Besides NEWLINE, INDENT and DEDENT, the following categories of tokens +exist: \emph{identifiers}, \emph{keywords}, \emph{literals}, +\emph{operators}, and \emph{delimiters}. +Whitespace characters (other than line terminators, discussed earlier) +are not tokens, but serve to delimit tokens. +Where +ambiguity exists, a token comprises the longest possible string that +forms a legal token, when read from left to right. + + +\section{Identifiers and keywords\label{identifiers}} + +Identifiers (also referred to as \emph{names}) are described by the following +lexical definitions: +\index{identifier} +\index{name} + +\begin{productionlist} + \production{identifier} + {(\token{letter}|"_") (\token{letter} | \token{digit} | "_")*} + \production{letter} + {\token{lowercase} | \token{uppercase}} + \production{lowercase} + {"a"..."z"} + \production{uppercase} + {"A"..."Z"} + \production{digit} + {"0"..."9"} +\end{productionlist} + +Identifiers are unlimited in length. Case is significant. + + +\subsection{Keywords\label{keywords}} + +The following identifiers are used as reserved words, or +\emph{keywords} of the language, and cannot be used as ordinary +identifiers. They must be spelled exactly as written here:% +\index{keyword}% +\index{reserved word} + +\begin{verbatim} +and del from not while +as elif global or with +assert else if pass yield +break except import print +class exec in raise +continue finally is return +def for lambda try +\end{verbatim} + +% When adding keywords, use reswords.py for reformatting + +\versionchanged[\constant{None} became a constant and is now +recognized by the compiler as a name for the built-in object +\constant{None}. Although it is not a keyword, you cannot assign +a different object to it]{2.4} + +\versionchanged[Both \keyword{as} and \keyword{with} are only recognized +when the \code{with_statement} future feature has been enabled. +It will always be enabled in Python 2.6. See section~\ref{with} for +details. Note that using \keyword{as} and \keyword{with} as identifiers +will always issue a warning, even when the \code{with_statement} future +directive is not in effect]{2.5} + + +\subsection{Reserved classes of identifiers\label{id-classes}} + +Certain classes of identifiers (besides keywords) have special +meanings. These classes are identified by the patterns of leading and +trailing underscore characters: + +\begin{description} + +\item[\code{_*}] + Not imported by \samp{from \var{module} import *}. The special + identifier \samp{_} is used in the interactive interpreter to store + the result of the last evaluation; it is stored in the + \module{__builtin__} module. When not in interactive mode, \samp{_} + has no special meaning and is not defined. + See section~\ref{import}, ``The \keyword{import} statement.'' + + \note{The name \samp{_} is often used in conjunction with + internationalization; refer to the documentation for the + \ulink{\module{gettext} module}{../lib/module-gettext.html} for more + information on this convention.} + +\item[\code{__*__}] + System-defined names. These names are defined by the interpreter + and its implementation (including the standard library); + applications should not expect to define additional names using this + convention. The set of names of this class defined by Python may be + extended in future versions. + See section~\ref{specialnames}, ``Special method names.'' + +\item[\code{__*}] + Class-private names. Names in this category, when used within the + context of a class definition, are re-written to use a mangled form + to help avoid name clashes between ``private'' attributes of base + and derived classes. + See section~\ref{atom-identifiers}, ``Identifiers (Names).'' + +\end{description} + + +\section{Literals\label{literals}} + +Literals are notations for constant values of some built-in types. +\index{literal} +\index{constant} + + +\subsection{String literals\label{strings}} + +String literals are described by the following lexical definitions: +\index{string literal} + +\index{ASCII@\ASCII} +\begin{productionlist} + \production{stringliteral} + {[\token{stringprefix}](\token{shortstring} | \token{longstring})} + \production{stringprefix} + {"r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" | "uR"} + \production{shortstring} + {"'" \token{shortstringitem}* "'" + | '"' \token{shortstringitem}* '"'} + \production{longstring} + {"'''" \token{longstringitem}* "'''"} + \productioncont{| '"""' \token{longstringitem}* '"""'} + \production{shortstringitem} + {\token{shortstringchar} | \token{escapeseq}} + \production{longstringitem} + {\token{longstringchar} | \token{escapeseq}} + \production{shortstringchar} + {<any source character except "\e" or newline or the quote>} + \production{longstringchar} + {<any source character except "\e">} + \production{escapeseq} + {"\e" <any ASCII character>} +\end{productionlist} + +One syntactic restriction not indicated by these productions is that +whitespace is not allowed between the \grammartoken{stringprefix} and +the rest of the string literal. The source character set is defined +by the encoding declaration; it is \ASCII{} if no encoding declaration +is given in the source file; see section~\ref{encodings}. + +\index{triple-quoted string} +\index{Unicode Consortium} +\index{string!Unicode} +In plain English: String literals can be enclosed in matching single +quotes (\code{'}) or double quotes (\code{"}). They can also be +enclosed in matching groups of three single or double quotes (these +are generally referred to as \emph{triple-quoted strings}). The +backslash (\code{\e}) character is used to escape characters that +otherwise have a special meaning, such as newline, backslash itself, +or the quote character. String literals may optionally be prefixed +with a letter \character{r} or \character{R}; such strings are called +\dfn{raw strings}\index{raw string} and use different rules for interpreting +backslash escape sequences. A prefix of \character{u} or \character{U} +makes the string a Unicode string. Unicode strings use the Unicode character +set as defined by the Unicode Consortium and ISO~10646. Some additional +escape sequences, described below, are available in Unicode strings. +The two prefix characters may be combined; in this case, \character{u} must +appear before \character{r}. + +In triple-quoted strings, +unescaped newlines and quotes are allowed (and are retained), except +that three unescaped quotes in a row terminate the string. (A +``quote'' is the character used to open the string, i.e. either +\code{'} or \code{"}.) + +Unless an \character{r} or \character{R} prefix is present, escape +sequences in strings are interpreted according to rules similar +to those used by Standard C. The recognized escape sequences are: +\index{physical line} +\index{escape sequence} +\index{Standard C} +\index{C} + +\begin{tableiii}{l|l|c}{code}{Escape Sequence}{Meaning}{Notes} +\lineiii{\e\var{newline}} {Ignored}{} +\lineiii{\e\e} {Backslash (\code{\e})}{} +\lineiii{\e'} {Single quote (\code{'})}{} +\lineiii{\e"} {Double quote (\code{"})}{} +\lineiii{\e a} {\ASCII{} Bell (BEL)}{} +\lineiii{\e b} {\ASCII{} Backspace (BS)}{} +\lineiii{\e f} {\ASCII{} Formfeed (FF)}{} +\lineiii{\e n} {\ASCII{} Linefeed (LF)}{} +\lineiii{\e N\{\var{name}\}} + {Character named \var{name} in the Unicode database (Unicode only)}{} +\lineiii{\e r} {\ASCII{} Carriage Return (CR)}{} +\lineiii{\e t} {\ASCII{} Horizontal Tab (TAB)}{} +\lineiii{\e u\var{xxxx}} + {Character with 16-bit hex value \var{xxxx} (Unicode only)}{(1)} +\lineiii{\e U\var{xxxxxxxx}} + {Character with 32-bit hex value \var{xxxxxxxx} (Unicode only)}{(2)} +\lineiii{\e v} {\ASCII{} Vertical Tab (VT)}{} +\lineiii{\e\var{ooo}} {Character with octal value \var{ooo}}{(3,5)} +\lineiii{\e x\var{hh}} {Character with hex value \var{hh}}{(4,5)} +\end{tableiii} +\index{ASCII@\ASCII} + +\noindent +Notes: + +\begin{itemize} +\item[(1)] + Individual code units which form parts of a surrogate pair can be + encoded using this escape sequence. +\item[(2)] + Any Unicode character can be encoded this way, but characters + outside the Basic Multilingual Plane (BMP) will be encoded using a + surrogate pair if Python is compiled to use 16-bit code units (the + default). Individual code units which form parts of a surrogate + pair can be encoded using this escape sequence. +\item[(3)] + As in Standard C, up to three octal digits are accepted. +\item[(4)] + Unlike in Standard C, at most two hex digits are accepted. +\item[(5)] + In a string literal, hexadecimal and octal escapes denote the + byte with the given value; it is not necessary that the byte + encodes a character in the source character set. In a Unicode + literal, these escapes denote a Unicode character with the given + value. +\end{itemize} + + +Unlike Standard \index{unrecognized escape sequence}C, +all unrecognized escape sequences are left in the string unchanged, +i.e., \emph{the backslash is left in the string}. (This behavior is +useful when debugging: if an escape sequence is mistyped, the +resulting output is more easily recognized as broken.) It is also +important to note that the escape sequences marked as ``(Unicode +only)'' in the table above fall into the category of unrecognized +escapes for non-Unicode string literals. + +When an \character{r} or \character{R} prefix is present, a character +following a backslash is included in the string without change, and \emph{all +backslashes are left in the string}. For example, the string literal +\code{r"\e n"} consists of two characters: a backslash and a lowercase +\character{n}. String quotes can be escaped with a backslash, but the +backslash remains in the string; for example, \code{r"\e""} is a valid string +literal consisting of two characters: a backslash and a double quote; +\code{r"\e"} is not a valid string literal (even a raw string cannot +end in an odd number of backslashes). Specifically, \emph{a raw +string cannot end in a single backslash} (since the backslash would +escape the following quote character). Note also that a single +backslash followed by a newline is interpreted as those two characters +as part of the string, \emph{not} as a line continuation. + +When an \character{r} or \character{R} prefix is used in conjunction +with a \character{u} or \character{U} prefix, then the \code{\e uXXXX} +and \code{\e UXXXXXXXX} escape sequences are processed while +\emph{all other backslashes are left in the string}. +For example, the string literal +\code{ur"\e{}u0062\e n"} consists of three Unicode characters: `LATIN +SMALL LETTER B', `REVERSE SOLIDUS', and `LATIN SMALL LETTER N'. +Backslashes can be escaped with a preceding backslash; however, both +remain in the string. As a result, \code{\e uXXXX} escape sequences +are only recognized when there are an odd number of backslashes. + +\subsection{String literal concatenation\label{string-catenation}} + +Multiple adjacent string literals (delimited by whitespace), possibly +using different quoting conventions, are allowed, and their meaning is +the same as their concatenation. Thus, \code{"hello" 'world'} is +equivalent to \code{"helloworld"}. This feature can be used to reduce +the number of backslashes needed, to split long strings conveniently +across long lines, or even to add comments to parts of strings, for +example: + +\begin{verbatim} +re.compile("[A-Za-z_]" # letter or underscore + "[A-Za-z0-9_]*" # letter, digit or underscore + ) +\end{verbatim} + +Note that this feature is defined at the syntactical level, but +implemented at compile time. The `+' operator must be used to +concatenate string expressions at run time. Also note that literal +concatenation can use different quoting styles for each component +(even mixing raw strings and triple quoted strings). + + +\subsection{Numeric literals\label{numbers}} + +There are four types of numeric literals: plain integers, long +integers, floating point numbers, and imaginary numbers. There are no +complex literals (complex numbers can be formed by adding a real +number and an imaginary number). +\index{number} +\index{numeric literal} +\index{integer literal} +\index{plain integer literal} +\index{long integer literal} +\index{floating point literal} +\index{hexadecimal literal} +\index{octal literal} +\index{decimal literal} +\index{imaginary literal} +\index{complex!literal} + +Note that numeric literals do not include a sign; a phrase like +\code{-1} is actually an expression composed of the unary operator +`\code{-}' and the literal \code{1}. + + +\subsection{Integer and long integer literals\label{integers}} + +Integer and long integer literals are described by the following +lexical definitions: + +\begin{productionlist} + \production{longinteger} + {\token{integer} ("l" | "L")} + \production{integer} + {\token{decimalinteger} | \token{octinteger} | \token{hexinteger}} + \production{decimalinteger} + {\token{nonzerodigit} \token{digit}* | "0"} + \production{octinteger} + {"0" \token{octdigit}+} + \production{hexinteger} + {"0" ("x" | "X") \token{hexdigit}+} + \production{nonzerodigit} + {"1"..."9"} + \production{octdigit} + {"0"..."7"} + \production{hexdigit} + {\token{digit} | "a"..."f" | "A"..."F"} +\end{productionlist} + +Although both lower case \character{l} and upper case \character{L} are +allowed as suffix for long integers, it is strongly recommended to always +use \character{L}, since the letter \character{l} looks too much like the +digit \character{1}. + +Plain integer literals that are above the largest representable plain +integer (e.g., 2147483647 when using 32-bit arithmetic) are accepted +as if they were long integers instead.\footnote{In versions of Python +prior to 2.4, octal and hexadecimal literals in the range just above +the largest representable plain integer but below the largest unsigned +32-bit number (on a machine using 32-bit arithmetic), 4294967296, were +taken as the negative plain integer obtained by subtracting 4294967296 +from their unsigned value.} There is no limit for long integer +literals apart from what can be stored in available memory. + +Some examples of plain integer literals (first row) and long integer +literals (second and third rows): + +\begin{verbatim} +7 2147483647 0177 +3L 79228162514264337593543950336L 0377L 0x100000000L + 79228162514264337593543950336 0xdeadbeef +\end{verbatim} + + +\subsection{Floating point literals\label{floating}} + +Floating point literals are described by the following lexical +definitions: + +\begin{productionlist} + \production{floatnumber} + {\token{pointfloat} | \token{exponentfloat}} + \production{pointfloat} + {[\token{intpart}] \token{fraction} | \token{intpart} "."} + \production{exponentfloat} + {(\token{intpart} | \token{pointfloat}) + \token{exponent}} + \production{intpart} + {\token{digit}+} + \production{fraction} + {"." \token{digit}+} + \production{exponent} + {("e" | "E") ["+" | "-"] \token{digit}+} +\end{productionlist} + +Note that the integer and exponent parts of floating point numbers +can look like octal integers, but are interpreted using radix 10. For +example, \samp{077e010} is legal, and denotes the same number +as \samp{77e10}. +The allowed range of floating point literals is +implementation-dependent. +Some examples of floating point literals: + +\begin{verbatim} +3.14 10. .001 1e100 3.14e-10 0e0 +\end{verbatim} + +Note that numeric literals do not include a sign; a phrase like +\code{-1} is actually an expression composed of the unary operator +\code{-} and the literal \code{1}. + + +\subsection{Imaginary literals\label{imaginary}} + +Imaginary literals are described by the following lexical definitions: + +\begin{productionlist} + \production{imagnumber}{(\token{floatnumber} | \token{intpart}) ("j" | "J")} +\end{productionlist} + +An imaginary literal yields a complex number with a real part of +0.0. Complex numbers are represented as a pair of floating point +numbers and have the same restrictions on their range. To create a +complex number with a nonzero real part, add a floating point number +to it, e.g., \code{(3+4j)}. Some examples of imaginary literals: + +\begin{verbatim} +3.14j 10.j 10j .001j 1e100j 3.14e-10j +\end{verbatim} + + +\section{Operators\label{operators}} + +The following tokens are operators: +\index{operators} + +\begin{verbatim} ++ - * ** / // % +<< >> & | ^ ~ +< > <= >= == != <> +\end{verbatim} + +The comparison operators \code{<>} and \code{!=} are alternate +spellings of the same operator. \code{!=} is the preferred spelling; +\code{<>} is obsolescent. + + +\section{Delimiters\label{delimiters}} + +The following tokens serve as delimiters in the grammar: +\index{delimiters} + +\begin{verbatim} +( ) [ ] { } @ +, : . ` = ; ++= -= *= /= //= %= +&= |= ^= >>= <<= **= +\end{verbatim} + +The period can also occur in floating-point and imaginary literals. A +sequence of three periods has a special meaning as an ellipsis in slices. +The second half of the list, the augmented assignment operators, serve +lexically as delimiters, but also perform an operation. + +The following printing \ASCII{} characters have special meaning as part +of other tokens or are otherwise significant to the lexical analyzer: + +\begin{verbatim} +' " # \ +\end{verbatim} + +The following printing \ASCII{} characters are not used in Python. Their +occurrence outside string literals and comments is an unconditional +error: +\index{ASCII@\ASCII} + +\begin{verbatim} +$ ? +\end{verbatim} |