diff options
author | cinap_lenrek <cinap_lenrek@localhost> | 2011-05-03 11:25:13 +0000 |
---|---|---|
committer | cinap_lenrek <cinap_lenrek@localhost> | 2011-05-03 11:25:13 +0000 |
commit | 458120dd40db6b4df55a4e96b650e16798ef06a0 (patch) | |
tree | 8f82685be24fef97e715c6f5ca4c68d34d5074ee /sys/src/cmd/python/Doc/lib/liburllib2.tex | |
parent | 3a742c699f6806c1145aea5149bf15de15a0afd7 (diff) |
add hg and python
Diffstat (limited to 'sys/src/cmd/python/Doc/lib/liburllib2.tex')
-rw-r--r-- | sys/src/cmd/python/Doc/lib/liburllib2.tex | 872 |
1 files changed, 872 insertions, 0 deletions
diff --git a/sys/src/cmd/python/Doc/lib/liburllib2.tex b/sys/src/cmd/python/Doc/lib/liburllib2.tex new file mode 100644 index 000000000..542a7b8ea --- /dev/null +++ b/sys/src/cmd/python/Doc/lib/liburllib2.tex @@ -0,0 +1,872 @@ +\section{\module{urllib2} --- + extensible library for opening URLs} + +\declaremodule{standard}{urllib2} +\moduleauthor{Jeremy Hylton}{jhylton@users.sourceforge.net} +\sectionauthor{Moshe Zadka}{moshez@users.sourceforge.net} + +\modulesynopsis{An extensible library for opening URLs using a variety of + protocols} + +The \module{urllib2} module defines functions and classes which help +in opening URLs (mostly HTTP) in a complex world --- basic and digest +authentication, redirections, cookies and more. + +The \module{urllib2} module defines the following functions: + +\begin{funcdesc}{urlopen}{url\optional{, data}} +Open the URL \var{url}, which can be either a string or a \class{Request} +object. + +\var{data} may be a string specifying additional data to send to the +server, or \code{None} if no such data is needed. +Currently HTTP requests are the only ones that use \var{data}; +the HTTP request will be a POST instead of a GET when the \var{data} +parameter is provided. \var{data} should be a buffer in the standard +\mimetype{application/x-www-form-urlencoded} format. The +\function{urllib.urlencode()} function takes a mapping or sequence of +2-tuples and returns a string in this format. + +This function returns a file-like object with two additional methods: + +\begin{itemize} + \item \method{geturl()} --- return the URL of the resource retrieved + \item \method{info()} --- return the meta-information of the page, as + a dictionary-like object +\end{itemize} + +Raises \exception{URLError} on errors. + +Note that \code{None} may be returned if no handler handles the +request (though the default installed global \class{OpenerDirector} +uses \class{UnknownHandler} to ensure this never happens). +\end{funcdesc} + +\begin{funcdesc}{install_opener}{opener} +Install an \class{OpenerDirector} instance as the default global +opener. Installing an opener is only necessary if you want urlopen to +use that opener; otherwise, simply call \method{OpenerDirector.open()} +instead of \function{urlopen()}. The code does not check for a real +\class{OpenerDirector}, and any class with the appropriate interface +will work. +\end{funcdesc} + +\begin{funcdesc}{build_opener}{\optional{handler, \moreargs}} +Return an \class{OpenerDirector} instance, which chains the +handlers in the order given. \var{handler}s can be either instances +of \class{BaseHandler}, or subclasses of \class{BaseHandler} (in +which case it must be possible to call the constructor without +any parameters). Instances of the following classes will be in +front of the \var{handler}s, unless the \var{handler}s contain +them, instances of them or subclasses of them: +\class{ProxyHandler}, \class{UnknownHandler}, \class{HTTPHandler}, +\class{HTTPDefaultErrorHandler}, \class{HTTPRedirectHandler}, +\class{FTPHandler}, \class{FileHandler}, \class{HTTPErrorProcessor}. + +If the Python installation has SSL support (\function{socket.ssl()} +exists), \class{HTTPSHandler} will also be added. + +Beginning in Python 2.3, a \class{BaseHandler} subclass may also +change its \member{handler_order} member variable to modify its +position in the handlers list. +\end{funcdesc} + + +The following exceptions are raised as appropriate: + +\begin{excdesc}{URLError} +The handlers raise this exception (or derived exceptions) when they +run into a problem. It is a subclass of \exception{IOError}. +\end{excdesc} + +\begin{excdesc}{HTTPError} +A subclass of \exception{URLError}, it can also function as a +non-exceptional file-like return value (the same thing that +\function{urlopen()} returns). This is useful when handling exotic +HTTP errors, such as requests for authentication. +\end{excdesc} + +\begin{excdesc}{GopherError} +A subclass of \exception{URLError}, this is the error raised by the +Gopher handler. +\end{excdesc} + + +The following classes are provided: + +\begin{classdesc}{Request}{url\optional{, data}\optional{, headers} + \optional{, origin_req_host}\optional{, unverifiable}} +This class is an abstraction of a URL request. + +\var{url} should be a string containing a valid URL. + +\var{data} may be a string specifying additional data to send to the +server, or \code{None} if no such data is needed. +Currently HTTP requests are the only ones that use \var{data}; +the HTTP request will be a POST instead of a GET when the \var{data} +parameter is provided. \var{data} should be a buffer in the standard +\mimetype{application/x-www-form-urlencoded} format. The +\function{urllib.urlencode()} function takes a mapping or sequence of +2-tuples and returns a string in this format. + +\var{headers} should be a dictionary, and will be treated as if +\method{add_header()} was called with each key and value as arguments. + +The final two arguments are only of interest for correct handling of +third-party HTTP cookies: + +\var{origin_req_host} should be the request-host of the origin +transaction, as defined by \rfc{2965}. It defaults to +\code{cookielib.request_host(self)}. This is the host name or IP +address of the original request that was initiated by the user. For +example, if the request is for an image in an HTML document, this +should be the request-host of the request for the page containing the +image. + +\var{unverifiable} should indicate whether the request is +unverifiable, as defined by RFC 2965. It defaults to False. An +unverifiable request is one whose URL the user did not have the option +to approve. For example, if the request is for an image in an HTML +document, and the user had no option to approve the automatic fetching +of the image, this should be true. +\end{classdesc} + +\begin{classdesc}{OpenerDirector}{} +The \class{OpenerDirector} class opens URLs via \class{BaseHandler}s +chained together. It manages the chaining of handlers, and recovery +from errors. +\end{classdesc} + +\begin{classdesc}{BaseHandler}{} +This is the base class for all registered handlers --- and handles only +the simple mechanics of registration. +\end{classdesc} + +\begin{classdesc}{HTTPDefaultErrorHandler}{} +A class which defines a default handler for HTTP error responses; all +responses are turned into \exception{HTTPError} exceptions. +\end{classdesc} + +\begin{classdesc}{HTTPRedirectHandler}{} +A class to handle redirections. +\end{classdesc} + +\begin{classdesc}{HTTPCookieProcessor}{\optional{cookiejar}} +A class to handle HTTP Cookies. +\end{classdesc} + +\begin{classdesc}{ProxyHandler}{\optional{proxies}} +Cause requests to go through a proxy. +If \var{proxies} is given, it must be a dictionary mapping +protocol names to URLs of proxies. +The default is to read the list of proxies from the environment +variables \envvar{<protocol>_proxy}. +\end{classdesc} + +\begin{classdesc}{HTTPPasswordMgr}{} +Keep a database of +\code{(\var{realm}, \var{uri}) -> (\var{user}, \var{password})} +mappings. +\end{classdesc} + +\begin{classdesc}{HTTPPasswordMgrWithDefaultRealm}{} +Keep a database of +\code{(\var{realm}, \var{uri}) -> (\var{user}, \var{password})} mappings. +A realm of \code{None} is considered a catch-all realm, which is searched +if no other realm fits. +\end{classdesc} + +\begin{classdesc}{AbstractBasicAuthHandler}{\optional{password_mgr}} +This is a mixin class that helps with HTTP authentication, both +to the remote host and to a proxy. +\var{password_mgr}, if given, should be something that is compatible +with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr} +for information on the interface that must be supported. +\end{classdesc} + +\begin{classdesc}{HTTPBasicAuthHandler}{\optional{password_mgr}} +Handle authentication with the remote host. +\var{password_mgr}, if given, should be something that is compatible +with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr} +for information on the interface that must be supported. +\end{classdesc} + +\begin{classdesc}{ProxyBasicAuthHandler}{\optional{password_mgr}} +Handle authentication with the proxy. +\var{password_mgr}, if given, should be something that is compatible +with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr} +for information on the interface that must be supported. +\end{classdesc} + +\begin{classdesc}{AbstractDigestAuthHandler}{\optional{password_mgr}} +This is a mixin class that helps with HTTP authentication, both +to the remote host and to a proxy. +\var{password_mgr}, if given, should be something that is compatible +with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr} +for information on the interface that must be supported. +\end{classdesc} + +\begin{classdesc}{HTTPDigestAuthHandler}{\optional{password_mgr}} +Handle authentication with the remote host. +\var{password_mgr}, if given, should be something that is compatible +with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr} +for information on the interface that must be supported. +\end{classdesc} + +\begin{classdesc}{ProxyDigestAuthHandler}{\optional{password_mgr}} +Handle authentication with the proxy. +\var{password_mgr}, if given, should be something that is compatible +with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr} +for information on the interface that must be supported. +\end{classdesc} + +\begin{classdesc}{HTTPHandler}{} +A class to handle opening of HTTP URLs. +\end{classdesc} + +\begin{classdesc}{HTTPSHandler}{} +A class to handle opening of HTTPS URLs. +\end{classdesc} + +\begin{classdesc}{FileHandler}{} +Open local files. +\end{classdesc} + +\begin{classdesc}{FTPHandler}{} +Open FTP URLs. +\end{classdesc} + +\begin{classdesc}{CacheFTPHandler}{} +Open FTP URLs, keeping a cache of open FTP connections to minimize +delays. +\end{classdesc} + +\begin{classdesc}{GopherHandler}{} +Open gopher URLs. +\end{classdesc} + +\begin{classdesc}{UnknownHandler}{} +A catch-all class to handle unknown URLs. +\end{classdesc} + + +\subsection{Request Objects \label{request-objects}} + +The following methods describe all of \class{Request}'s public interface, +and so all must be overridden in subclasses. + +\begin{methoddesc}[Request]{add_data}{data} +Set the \class{Request} data to \var{data}. This is ignored by all +handlers except HTTP handlers --- and there it should be a byte +string, and will change the request to be \code{POST} rather than +\code{GET}. +\end{methoddesc} + +\begin{methoddesc}[Request]{get_method}{} +Return a string indicating the HTTP request method. This is only +meaningful for HTTP requests, and currently always returns +\code{'GET'} or \code{'POST'}. +\end{methoddesc} + +\begin{methoddesc}[Request]{has_data}{} +Return whether the instance has a non-\code{None} data. +\end{methoddesc} + +\begin{methoddesc}[Request]{get_data}{} +Return the instance's data. +\end{methoddesc} + +\begin{methoddesc}[Request]{add_header}{key, val} +Add another header to the request. Headers are currently ignored by +all handlers except HTTP handlers, where they are added to the list +of headers sent to the server. Note that there cannot be more than +one header with the same name, and later calls will overwrite +previous calls in case the \var{key} collides. Currently, this is +no loss of HTTP functionality, since all headers which have meaning +when used more than once have a (header-specific) way of gaining the +same functionality using only one header. +\end{methoddesc} + +\begin{methoddesc}[Request]{add_unredirected_header}{key, header} +Add a header that will not be added to a redirected request. +\versionadded{2.4} +\end{methoddesc} + +\begin{methoddesc}[Request]{has_header}{header} +Return whether the instance has the named header (checks both regular +and unredirected). +\versionadded{2.4} +\end{methoddesc} + +\begin{methoddesc}[Request]{get_full_url}{} +Return the URL given in the constructor. +\end{methoddesc} + +\begin{methoddesc}[Request]{get_type}{} +Return the type of the URL --- also known as the scheme. +\end{methoddesc} + +\begin{methoddesc}[Request]{get_host}{} +Return the host to which a connection will be made. +\end{methoddesc} + +\begin{methoddesc}[Request]{get_selector}{} +Return the selector --- the part of the URL that is sent to +the server. +\end{methoddesc} + +\begin{methoddesc}[Request]{set_proxy}{host, type} +Prepare the request by connecting to a proxy server. The \var{host} +and \var{type} will replace those of the instance, and the instance's +selector will be the original URL given in the constructor. +\end{methoddesc} + +\begin{methoddesc}[Request]{get_origin_req_host}{} +Return the request-host of the origin transaction, as defined by +\rfc{2965}. See the documentation for the \class{Request} +constructor. +\end{methoddesc} + +\begin{methoddesc}[Request]{is_unverifiable}{} +Return whether the request is unverifiable, as defined by RFC 2965. +See the documentation for the \class{Request} constructor. +\end{methoddesc} + + +\subsection{OpenerDirector Objects \label{opener-director-objects}} + +\class{OpenerDirector} instances have the following methods: + +\begin{methoddesc}[OpenerDirector]{add_handler}{handler} +\var{handler} should be an instance of \class{BaseHandler}. The +following methods are searched, and added to the possible chains (note +that HTTP errors are a special case). + +\begin{itemize} + \item \method{\var{protocol}_open()} --- + signal that the handler knows how to open \var{protocol} URLs. + \item \method{http_error_\var{type}()} --- + signal that the handler knows how to handle HTTP errors with HTTP + error code \var{type}. + \item \method{\var{protocol}_error()} --- + signal that the handler knows how to handle errors from + (non-\code{http}) \var{protocol}. + \item \method{\var{protocol}_request()} --- + signal that the handler knows how to pre-process \var{protocol} + requests. + \item \method{\var{protocol}_response()} --- + signal that the handler knows how to post-process \var{protocol} + responses. +\end{itemize} +\end{methoddesc} + +\begin{methoddesc}[OpenerDirector]{open}{url\optional{, data}} +Open the given \var{url} (which can be a request object or a string), +optionally passing the given \var{data}. +Arguments, return values and exceptions raised are the same as those +of \function{urlopen()} (which simply calls the \method{open()} method +on the currently installed global \class{OpenerDirector}). +\end{methoddesc} + +\begin{methoddesc}[OpenerDirector]{error}{proto\optional{, + arg\optional{, \moreargs}}} +Handle an error of the given protocol. This will call the registered +error handlers for the given protocol with the given arguments (which +are protocol specific). The HTTP protocol is a special case which +uses the HTTP response code to determine the specific error handler; +refer to the \method{http_error_*()} methods of the handler classes. + +Return values and exceptions raised are the same as those +of \function{urlopen()}. +\end{methoddesc} + +OpenerDirector objects open URLs in three stages: + +The order in which these methods are called within each stage is +determined by sorting the handler instances. + +\begin{enumerate} + \item Every handler with a method named like + \method{\var{protocol}_request()} has that method called to + pre-process the request. + + \item Handlers with a method named like + \method{\var{protocol}_open()} are called to handle the request. + This stage ends when a handler either returns a + non-\constant{None} value (ie. a response), or raises an exception + (usually \exception{URLError}). Exceptions are allowed to propagate. + + In fact, the above algorithm is first tried for methods named + \method{default_open}. If all such methods return + \constant{None}, the algorithm is repeated for methods named like + \method{\var{protocol}_open()}. If all such methods return + \constant{None}, the algorithm is repeated for methods named + \method{unknown_open()}. + + Note that the implementation of these methods may involve calls of + the parent \class{OpenerDirector} instance's \method{.open()} and + \method{.error()} methods. + + \item Every handler with a method named like + \method{\var{protocol}_response()} has that method called to + post-process the response. + +\end{enumerate} + +\subsection{BaseHandler Objects \label{base-handler-objects}} + +\class{BaseHandler} objects provide a couple of methods that are +directly useful, and others that are meant to be used by derived +classes. These are intended for direct use: + +\begin{methoddesc}[BaseHandler]{add_parent}{director} +Add a director as parent. +\end{methoddesc} + +\begin{methoddesc}[BaseHandler]{close}{} +Remove any parents. +\end{methoddesc} + +The following members and methods should only be used by classes +derived from \class{BaseHandler}. \note{The convention has been +adopted that subclasses defining \method{\var{protocol}_request()} or +\method{\var{protocol}_response()} methods are named +\class{*Processor}; all others are named \class{*Handler}.} + + +\begin{memberdesc}[BaseHandler]{parent} +A valid \class{OpenerDirector}, which can be used to open using a +different protocol, or handle errors. +\end{memberdesc} + +\begin{methoddesc}[BaseHandler]{default_open}{req} +This method is \emph{not} defined in \class{BaseHandler}, but +subclasses should define it if they want to catch all URLs. + +This method, if implemented, will be called by the parent +\class{OpenerDirector}. It should return a file-like object as +described in the return value of the \method{open()} of +\class{OpenerDirector}, or \code{None}. It should raise +\exception{URLError}, unless a truly exceptional thing happens (for +example, \exception{MemoryError} should not be mapped to +\exception{URLError}). + +This method will be called before any protocol-specific open method. +\end{methoddesc} + +\begin{methoddescni}[BaseHandler]{\var{protocol}_open}{req} +This method is \emph{not} defined in \class{BaseHandler}, but +subclasses should define it if they want to handle URLs with the given +protocol. + +This method, if defined, will be called by the parent +\class{OpenerDirector}. Return values should be the same as for +\method{default_open()}. +\end{methoddescni} + +\begin{methoddesc}[BaseHandler]{unknown_open}{req} +This method is \var{not} defined in \class{BaseHandler}, but +subclasses should define it if they want to catch all URLs with no +specific registered handler to open it. + +This method, if implemented, will be called by the \member{parent} +\class{OpenerDirector}. Return values should be the same as for +\method{default_open()}. +\end{methoddesc} + +\begin{methoddesc}[BaseHandler]{http_error_default}{req, fp, code, msg, hdrs} +This method is \emph{not} defined in \class{BaseHandler}, but +subclasses should override it if they intend to provide a catch-all +for otherwise unhandled HTTP errors. It will be called automatically +by the \class{OpenerDirector} getting the error, and should not +normally be called in other circumstances. + +\var{req} will be a \class{Request} object, \var{fp} will be a +file-like object with the HTTP error body, \var{code} will be the +three-digit code of the error, \var{msg} will be the user-visible +explanation of the code and \var{hdrs} will be a mapping object with +the headers of the error. + +Return values and exceptions raised should be the same as those +of \function{urlopen()}. +\end{methoddesc} + +\begin{methoddesc}[BaseHandler]{http_error_\var{nnn}}{req, fp, code, msg, hdrs} +\var{nnn} should be a three-digit HTTP error code. This method is +also not defined in \class{BaseHandler}, but will be called, if it +exists, on an instance of a subclass, when an HTTP error with code +\var{nnn} occurs. + +Subclasses should override this method to handle specific HTTP +errors. + +Arguments, return values and exceptions raised should be the same as +for \method{http_error_default()}. +\end{methoddesc} + +\begin{methoddescni}[BaseHandler]{\var{protocol}_request}{req} +This method is \emph{not} defined in \class{BaseHandler}, but +subclasses should define it if they want to pre-process requests of +the given protocol. + +This method, if defined, will be called by the parent +\class{OpenerDirector}. \var{req} will be a \class{Request} object. +The return value should be a \class{Request} object. +\end{methoddescni} + +\begin{methoddescni}[BaseHandler]{\var{protocol}_response}{req, response} +This method is \emph{not} defined in \class{BaseHandler}, but +subclasses should define it if they want to post-process responses of +the given protocol. + +This method, if defined, will be called by the parent +\class{OpenerDirector}. \var{req} will be a \class{Request} object. +\var{response} will be an object implementing the same interface as +the return value of \function{urlopen()}. The return value should +implement the same interface as the return value of +\function{urlopen()}. +\end{methoddescni} + +\subsection{HTTPRedirectHandler Objects \label{http-redirect-handler}} + +\note{Some HTTP redirections require action from this module's client + code. If this is the case, \exception{HTTPError} is raised. See + \rfc{2616} for details of the precise meanings of the various + redirection codes.} + +\begin{methoddesc}[HTTPRedirectHandler]{redirect_request}{req, + fp, code, msg, hdrs} +Return a \class{Request} or \code{None} in response to a redirect. +This is called by the default implementations of the +\method{http_error_30*()} methods when a redirection is received from +the server. If a redirection should take place, return a new +\class{Request} to allow \method{http_error_30*()} to perform the +redirect. Otherwise, raise \exception{HTTPError} if no other handler +should try to handle this URL, or return \code{None} if you can't but +another handler might. + +\begin{notice} + The default implementation of this method does not strictly + follow \rfc{2616}, which says that 301 and 302 responses to \code{POST} + requests must not be automatically redirected without confirmation by + the user. In reality, browsers do allow automatic redirection of + these responses, changing the POST to a \code{GET}, and the default + implementation reproduces this behavior. +\end{notice} +\end{methoddesc} + + +\begin{methoddesc}[HTTPRedirectHandler]{http_error_301}{req, + fp, code, msg, hdrs} +Redirect to the \code{Location:} URL. This method is called by +the parent \class{OpenerDirector} when getting an HTTP +`moved permanently' response. +\end{methoddesc} + +\begin{methoddesc}[HTTPRedirectHandler]{http_error_302}{req, + fp, code, msg, hdrs} +The same as \method{http_error_301()}, but called for the +`found' response. +\end{methoddesc} + +\begin{methoddesc}[HTTPRedirectHandler]{http_error_303}{req, + fp, code, msg, hdrs} +The same as \method{http_error_301()}, but called for the +`see other' response. +\end{methoddesc} + +\begin{methoddesc}[HTTPRedirectHandler]{http_error_307}{req, + fp, code, msg, hdrs} +The same as \method{http_error_301()}, but called for the +`temporary redirect' response. +\end{methoddesc} + + +\subsection{HTTPCookieProcessor Objects \label{http-cookie-processor}} + +\versionadded{2.4} + +\class{HTTPCookieProcessor} instances have one attribute: + +\begin{memberdesc}{cookiejar} +The \class{cookielib.CookieJar} in which cookies are stored. +\end{memberdesc} + + +\subsection{ProxyHandler Objects \label{proxy-handler}} + +\begin{methoddescni}[ProxyHandler]{\var{protocol}_open}{request} +The \class{ProxyHandler} will have a method +\method{\var{protocol}_open()} for every \var{protocol} which has a +proxy in the \var{proxies} dictionary given in the constructor. The +method will modify requests to go through the proxy, by calling +\code{request.set_proxy()}, and call the next handler in the chain to +actually execute the protocol. +\end{methoddescni} + + +\subsection{HTTPPasswordMgr Objects \label{http-password-mgr}} + +These methods are available on \class{HTTPPasswordMgr} and +\class{HTTPPasswordMgrWithDefaultRealm} objects. + +\begin{methoddesc}[HTTPPasswordMgr]{add_password}{realm, uri, user, passwd} +\var{uri} can be either a single URI, or a sequence of URIs. \var{realm}, +\var{user} and \var{passwd} must be strings. This causes +\code{(\var{user}, \var{passwd})} to be used as authentication tokens +when authentication for \var{realm} and a super-URI of any of the +given URIs is given. +\end{methoddesc} + +\begin{methoddesc}[HTTPPasswordMgr]{find_user_password}{realm, authuri} +Get user/password for given realm and URI, if any. This method will +return \code{(None, None)} if there is no matching user/password. + +For \class{HTTPPasswordMgrWithDefaultRealm} objects, the realm +\code{None} will be searched if the given \var{realm} has no matching +user/password. +\end{methoddesc} + + +\subsection{AbstractBasicAuthHandler Objects + \label{abstract-basic-auth-handler}} + +\begin{methoddesc}[AbstractBasicAuthHandler]{http_error_auth_reqed} + {authreq, host, req, headers} +Handle an authentication request by getting a user/password pair, and +re-trying the request. \var{authreq} should be the name of the header +where the information about the realm is included in the request, +\var{host} specifies the URL and path to authenticate for, \var{req} +should be the (failed) \class{Request} object, and \var{headers} +should be the error headers. + +\var{host} is either an authority (e.g. \code{"python.org"}) or a URL +containing an authority component (e.g. \code{"http://python.org/"}). +In either case, the authority must not contain a userinfo component +(so, \code{"python.org"} and \code{"python.org:80"} are fine, +\code{"joe:password@python.org"} is not). +\end{methoddesc} + + +\subsection{HTTPBasicAuthHandler Objects + \label{http-basic-auth-handler}} + +\begin{methoddesc}[HTTPBasicAuthHandler]{http_error_401}{req, fp, code, + msg, hdrs} +Retry the request with authentication information, if available. +\end{methoddesc} + + +\subsection{ProxyBasicAuthHandler Objects + \label{proxy-basic-auth-handler}} + +\begin{methoddesc}[ProxyBasicAuthHandler]{http_error_407}{req, fp, code, + msg, hdrs} +Retry the request with authentication information, if available. +\end{methoddesc} + + +\subsection{AbstractDigestAuthHandler Objects + \label{abstract-digest-auth-handler}} + +\begin{methoddesc}[AbstractDigestAuthHandler]{http_error_auth_reqed} + {authreq, host, req, headers} +\var{authreq} should be the name of the header where the information about +the realm is included in the request, \var{host} should be the host to +authenticate to, \var{req} should be the (failed) \class{Request} +object, and \var{headers} should be the error headers. +\end{methoddesc} + + +\subsection{HTTPDigestAuthHandler Objects + \label{http-digest-auth-handler}} + +\begin{methoddesc}[HTTPDigestAuthHandler]{http_error_401}{req, fp, code, + msg, hdrs} +Retry the request with authentication information, if available. +\end{methoddesc} + + +\subsection{ProxyDigestAuthHandler Objects + \label{proxy-digest-auth-handler}} + +\begin{methoddesc}[ProxyDigestAuthHandler]{http_error_407}{req, fp, code, + msg, hdrs} +Retry the request with authentication information, if available. +\end{methoddesc} + + +\subsection{HTTPHandler Objects \label{http-handler-objects}} + +\begin{methoddesc}[HTTPHandler]{http_open}{req} +Send an HTTP request, which can be either GET or POST, depending on +\code{\var{req}.has_data()}. +\end{methoddesc} + + +\subsection{HTTPSHandler Objects \label{https-handler-objects}} + +\begin{methoddesc}[HTTPSHandler]{https_open}{req} +Send an HTTPS request, which can be either GET or POST, depending on +\code{\var{req}.has_data()}. +\end{methoddesc} + + +\subsection{FileHandler Objects \label{file-handler-objects}} + +\begin{methoddesc}[FileHandler]{file_open}{req} +Open the file locally, if there is no host name, or +the host name is \code{'localhost'}. Change the +protocol to \code{ftp} otherwise, and retry opening +it using \member{parent}. +\end{methoddesc} + + +\subsection{FTPHandler Objects \label{ftp-handler-objects}} + +\begin{methoddesc}[FTPHandler]{ftp_open}{req} +Open the FTP file indicated by \var{req}. +The login is always done with empty username and password. +\end{methoddesc} + + +\subsection{CacheFTPHandler Objects \label{cacheftp-handler-objects}} + +\class{CacheFTPHandler} objects are \class{FTPHandler} objects with +the following additional methods: + +\begin{methoddesc}[CacheFTPHandler]{setTimeout}{t} +Set timeout of connections to \var{t} seconds. +\end{methoddesc} + +\begin{methoddesc}[CacheFTPHandler]{setMaxConns}{m} +Set maximum number of cached connections to \var{m}. +\end{methoddesc} + + +\subsection{GopherHandler Objects \label{gopher-handler}} + +\begin{methoddesc}[GopherHandler]{gopher_open}{req} +Open the gopher resource indicated by \var{req}. +\end{methoddesc} + + +\subsection{UnknownHandler Objects \label{unknown-handler-objects}} + +\begin{methoddesc}[UnknownHandler]{unknown_open}{} +Raise a \exception{URLError} exception. +\end{methoddesc} + + +\subsection{HTTPErrorProcessor Objects \label{http-error-processor-objects}} + +\versionadded{2.4} + +\begin{methoddesc}[HTTPErrorProcessor]{unknown_open}{} +Process HTTP error responses. + +For 200 error codes, the response object is returned immediately. + +For non-200 error codes, this simply passes the job on to the +\method{\var{protocol}_error_\var{code}()} handler methods, via +\method{OpenerDirector.error()}. Eventually, +\class{urllib2.HTTPDefaultErrorHandler} will raise an +\exception{HTTPError} if no other handler handles the error. +\end{methoddesc} + + +\subsection{Examples \label{urllib2-examples}} + +This example gets the python.org main page and displays the first 100 +bytes of it: + +\begin{verbatim} +>>> import urllib2 +>>> f = urllib2.urlopen('http://www.python.org/') +>>> print f.read(100) +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<?xml-stylesheet href="./css/ht2html +\end{verbatim} + +Here we are sending a data-stream to the stdin of a CGI and reading +the data it returns to us. Note that this example will only work when the +Python installation supports SSL. + +\begin{verbatim} +>>> import urllib2 +>>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi', +... data='This data is passed to stdin of the CGI') +>>> f = urllib2.urlopen(req) +>>> print f.read() +Got Data: "This data is passed to stdin of the CGI" +\end{verbatim} + +The code for the sample CGI used in the above example is: + +\begin{verbatim} +#!/usr/bin/env python +import sys +data = sys.stdin.read() +print 'Content-type: text-plain\n\nGot Data: "%s"' % data +\end{verbatim} + + +Use of Basic HTTP Authentication: + +\begin{verbatim} +import urllib2 +# Create an OpenerDirector with support for Basic HTTP Authentication... +auth_handler = urllib2.HTTPBasicAuthHandler() +auth_handler.add_password('realm', 'host', 'username', 'password') +opener = urllib2.build_opener(auth_handler) +# ...and install it globally so it can be used with urlopen. +urllib2.install_opener(opener) +urllib2.urlopen('http://www.example.com/login.html') +\end{verbatim} + +\function{build_opener()} provides many handlers by default, including a +\class{ProxyHandler}. By default, \class{ProxyHandler} uses the +environment variables named \code{<scheme>_proxy}, where \code{<scheme>} +is the URL scheme involved. For example, the \envvar{http_proxy} +environment variable is read to obtain the HTTP proxy's URL. + +This example replaces the default \class{ProxyHandler} with one that uses +programatically-supplied proxy URLs, and adds proxy authorization support +with \class{ProxyBasicAuthHandler}. + +\begin{verbatim} +proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'}) +proxy_auth_handler = urllib2.HTTPBasicAuthHandler() +proxy_auth_handler.add_password('realm', 'host', 'username', 'password') + +opener = build_opener(proxy_handler, proxy_auth_handler) +# This time, rather than install the OpenerDirector, we use it directly: +opener.open('http://www.example.com/login.html') +\end{verbatim} + + +Adding HTTP headers: + +Use the \var{headers} argument to the \class{Request} constructor, or: + +\begin{verbatim} +import urllib2 +req = urllib2.Request('http://www.example.com/') +req.add_header('Referer', 'http://www.python.org/') +r = urllib2.urlopen(req) +\end{verbatim} + +\class{OpenerDirector} automatically adds a \mailheader{User-Agent} +header to every \class{Request}. To change this: + +\begin{verbatim} +import urllib2 +opener = urllib2.build_opener() +opener.addheaders = [('User-agent', 'Mozilla/5.0')] +opener.open('http://www.example.com/') +\end{verbatim} + +Also, remember that a few standard headers +(\mailheader{Content-Length}, \mailheader{Content-Type} and +\mailheader{Host}) are added when the \class{Request} is passed to +\function{urlopen()} (or \method{OpenerDirector.open()}). |