summaryrefslogtreecommitdiff
path: root/sys/src/cmd/python/Doc/lib/libhtmllib.tex
diff options
context:
space:
mode:
authorcinap_lenrek <cinap_lenrek@localhost>2011-05-03 11:25:13 +0000
committercinap_lenrek <cinap_lenrek@localhost>2011-05-03 11:25:13 +0000
commit458120dd40db6b4df55a4e96b650e16798ef06a0 (patch)
tree8f82685be24fef97e715c6f5ca4c68d34d5074ee /sys/src/cmd/python/Doc/lib/libhtmllib.tex
parent3a742c699f6806c1145aea5149bf15de15a0afd7 (diff)
add hg and python
Diffstat (limited to 'sys/src/cmd/python/Doc/lib/libhtmllib.tex')
-rw-r--r--sys/src/cmd/python/Doc/lib/libhtmllib.tex181
1 files changed, 181 insertions, 0 deletions
diff --git a/sys/src/cmd/python/Doc/lib/libhtmllib.tex b/sys/src/cmd/python/Doc/lib/libhtmllib.tex
new file mode 100644
index 000000000..a84dd856d
--- /dev/null
+++ b/sys/src/cmd/python/Doc/lib/libhtmllib.tex
@@ -0,0 +1,181 @@
+\section{\module{htmllib} ---
+ A parser for HTML documents}
+
+\declaremodule{standard}{htmllib}
+\modulesynopsis{A parser for HTML documents.}
+
+\index{HTML}
+\index{hypertext}
+
+
+This module defines a class which can serve as a base for parsing text
+files formatted in the HyperText Mark-up Language (HTML). The class
+is not directly concerned with I/O --- it must be provided with input
+in string form via a method, and makes calls to methods of a
+``formatter'' object in order to produce output. The
+\class{HTMLParser} class is designed to be used as a base class for
+other classes in order to add functionality, and allows most of its
+methods to be extended or overridden. In turn, this class is derived
+from and extends the \class{SGMLParser} class defined in module
+\refmodule{sgmllib}\refstmodindex{sgmllib}. The \class{HTMLParser}
+implementation supports the HTML 2.0 language as described in
+\rfc{1866}. Two implementations of formatter objects are provided in
+the \refmodule{formatter}\refstmodindex{formatter}\ module; refer to the
+documentation for that module for information on the formatter
+interface.
+\withsubitem{(in module sgmllib)}{\ttindex{SGMLParser}}
+
+The following is a summary of the interface defined by
+\class{sgmllib.SGMLParser}:
+
+\begin{itemize}
+
+\item
+The interface to feed data to an instance is through the \method{feed()}
+method, which takes a string argument. This can be called with as
+little or as much text at a time as desired; \samp{p.feed(a);
+p.feed(b)} has the same effect as \samp{p.feed(a+b)}. When the data
+contains complete HTML markup constructs, these are processed immediately;
+incomplete constructs are saved in a buffer. To force processing of all
+unprocessed data, call the \method{close()} method.
+
+For example, to parse the entire contents of a file, use:
+\begin{verbatim}
+parser.feed(open('myfile.html').read())
+parser.close()
+\end{verbatim}
+
+\item
+The interface to define semantics for HTML tags is very simple: derive
+a class and define methods called \method{start_\var{tag}()},
+\method{end_\var{tag}()}, or \method{do_\var{tag}()}. The parser will
+call these at appropriate moments: \method{start_\var{tag}} or
+\method{do_\var{tag}()} is called when an opening tag of the form
+\code{<\var{tag} ...>} is encountered; \method{end_\var{tag}()} is called
+when a closing tag of the form \code{<\var{tag}>} is encountered. If
+an opening tag requires a corresponding closing tag, like \code{<H1>}
+... \code{</H1>}, the class should define the \method{start_\var{tag}()}
+method; if a tag requires no closing tag, like \code{<P>}, the class
+should define the \method{do_\var{tag}()} method.
+
+\end{itemize}
+
+The module defines a parser class and an exception:
+
+\begin{classdesc}{HTMLParser}{formatter}
+This is the basic HTML parser class. It supports all entity names
+required by the XHTML 1.0 Recommendation (\url{http://www.w3.org/TR/xhtml1}).
+It also defines handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements.
+\end{classdesc}
+
+\begin{excdesc}{HTMLParseError}
+Exception raised by the \class{HTMLParser} class when it encounters an
+error while parsing.
+\versionadded{2.4}
+\end{excdesc}
+
+
+\begin{seealso}
+ \seemodule{formatter}{Interface definition for transforming an
+ abstract flow of formatting events into
+ specific output events on writer objects.}
+ \seemodule{HTMLParser}{Alternate HTML parser that offers a slightly
+ lower-level view of the input, but is
+ designed to work with XHTML, and does not
+ implement some of the SGML syntax not used in
+ ``HTML as deployed'' and which isn't legal
+ for XHTML.}
+ \seemodule{htmlentitydefs}{Definition of replacement text for XHTML 1.0
+ entities.}
+ \seemodule{sgmllib}{Base class for \class{HTMLParser}.}
+\end{seealso}
+
+
+\subsection{HTMLParser Objects \label{html-parser-objects}}
+
+In addition to tag methods, the \class{HTMLParser} class provides some
+additional methods and instance variables for use within tag methods.
+
+\begin{memberdesc}{formatter}
+This is the formatter instance associated with the parser.
+\end{memberdesc}
+
+\begin{memberdesc}{nofill}
+Boolean flag which should be true when whitespace should not be
+collapsed, or false when it should be. In general, this should only
+be true when character data is to be treated as ``preformatted'' text,
+as within a \code{<PRE>} element. The default value is false. This
+affects the operation of \method{handle_data()} and \method{save_end()}.
+\end{memberdesc}
+
+
+\begin{methoddesc}{anchor_bgn}{href, name, type}
+This method is called at the start of an anchor region. The arguments
+correspond to the attributes of the \code{<A>} tag with the same
+names. The default implementation maintains a list of hyperlinks
+(defined by the \code{HREF} attribute for \code{<A>} tags) within the
+document. The list of hyperlinks is available as the data attribute
+\member{anchorlist}.
+\end{methoddesc}
+
+\begin{methoddesc}{anchor_end}{}
+This method is called at the end of an anchor region. The default
+implementation adds a textual footnote marker using an index into the
+list of hyperlinks created by \method{anchor_bgn()}.
+\end{methoddesc}
+
+\begin{methoddesc}{handle_image}{source, alt\optional{, ismap\optional{,
+ align\optional{, width\optional{, height}}}}}
+This method is called to handle images. The default implementation
+simply passes the \var{alt} value to the \method{handle_data()}
+method.
+\end{methoddesc}
+
+\begin{methoddesc}{save_bgn}{}
+Begins saving character data in a buffer instead of sending it to the
+formatter object. Retrieve the stored data via \method{save_end()}.
+Use of the \method{save_bgn()} / \method{save_end()} pair may not be
+nested.
+\end{methoddesc}
+
+\begin{methoddesc}{save_end}{}
+Ends buffering character data and returns all data saved since the
+preceding call to \method{save_bgn()}. If the \member{nofill} flag is
+false, whitespace is collapsed to single spaces. A call to this
+method without a preceding call to \method{save_bgn()} will raise a
+\exception{TypeError} exception.
+\end{methoddesc}
+
+
+
+\section{\module{htmlentitydefs} ---
+ Definitions of HTML general entities}
+
+\declaremodule{standard}{htmlentitydefs}
+\modulesynopsis{Definitions of HTML general entities.}
+\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
+
+This module defines three dictionaries, \code{name2codepoint},
+\code{codepoint2name}, and \code{entitydefs}. \code{entitydefs} is
+used by the \refmodule{htmllib} module to provide the
+\member{entitydefs} member of the \class{HTMLParser} class. The
+definition provided here contains all the entities defined by XHTML 1.0
+that can be handled using simple textual substitution in the Latin-1
+character set (ISO-8859-1).
+
+
+\begin{datadesc}{entitydefs}
+ A dictionary mapping XHTML 1.0 entity definitions to their
+ replacement text in ISO Latin-1.
+
+\end{datadesc}
+
+\begin{datadesc}{name2codepoint}
+ A dictionary that maps HTML entity names to the Unicode codepoints.
+ \versionadded{2.3}
+\end{datadesc}
+
+\begin{datadesc}{codepoint2name}
+ A dictionary that maps Unicode codepoints to HTML entity names.
+ \versionadded{2.3}
+\end{datadesc}