diff options
author | cinap_lenrek <cinap_lenrek@localhost> | 2011-05-03 11:25:13 +0000 |
---|---|---|
committer | cinap_lenrek <cinap_lenrek@localhost> | 2011-05-03 11:25:13 +0000 |
commit | 458120dd40db6b4df55a4e96b650e16798ef06a0 (patch) | |
tree | 8f82685be24fef97e715c6f5ca4c68d34d5074ee /sys/src/cmd/python/Doc/lib/xmldomminidom.tex | |
parent | 3a742c699f6806c1145aea5149bf15de15a0afd7 (diff) |
add hg and python
Diffstat (limited to 'sys/src/cmd/python/Doc/lib/xmldomminidom.tex')
-rw-r--r-- | sys/src/cmd/python/Doc/lib/xmldomminidom.tex | 285 |
1 files changed, 285 insertions, 0 deletions
diff --git a/sys/src/cmd/python/Doc/lib/xmldomminidom.tex b/sys/src/cmd/python/Doc/lib/xmldomminidom.tex new file mode 100644 index 000000000..093915fc8 --- /dev/null +++ b/sys/src/cmd/python/Doc/lib/xmldomminidom.tex @@ -0,0 +1,285 @@ +\section{\module{xml.dom.minidom} --- + Lightweight DOM implementation} + +\declaremodule{standard}{xml.dom.minidom} +\modulesynopsis{Lightweight Document Object Model (DOM) implementation.} +\moduleauthor{Paul Prescod}{paul@prescod.net} +\sectionauthor{Paul Prescod}{paul@prescod.net} +\sectionauthor{Martin v. L\"owis}{martin@v.loewis.de} + +\versionadded{2.0} + +\module{xml.dom.minidom} is a light-weight implementation of the +Document Object Model interface. It is intended to be +simpler than the full DOM and also significantly smaller. + +DOM applications typically start by parsing some XML into a DOM. With +\module{xml.dom.minidom}, this is done through the parse functions: + +\begin{verbatim} +from xml.dom.minidom import parse, parseString + +dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name + +datasource = open('c:\\temp\\mydata.xml') +dom2 = parse(datasource) # parse an open file + +dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>') +\end{verbatim} + +The \function{parse()} function can take either a filename or an open +file object. + +\begin{funcdesc}{parse}{filename_or_file{, parser}} + Return a \class{Document} from the given input. \var{filename_or_file} + may be either a file name, or a file-like object. \var{parser}, if + given, must be a SAX2 parser object. This function will change the + document handler of the parser and activate namespace support; other + parser configuration (like setting an entity resolver) must have been + done in advance. +\end{funcdesc} + +If you have XML in a string, you can use the +\function{parseString()} function instead: + +\begin{funcdesc}{parseString}{string\optional{, parser}} + Return a \class{Document} that represents the \var{string}. This + method creates a \class{StringIO} object for the string and passes + that on to \function{parse}. +\end{funcdesc} + +Both functions return a \class{Document} object representing the +content of the document. + +What the \function{parse()} and \function{parseString()} functions do +is connect an XML parser with a ``DOM builder'' that can accept parse +events from any SAX parser and convert them into a DOM tree. The name +of the functions are perhaps misleading, but are easy to grasp when +learning the interfaces. The parsing of the document will be +completed before these functions return; it's simply that these +functions do not provide a parser implementation themselves. + +You can also create a \class{Document} by calling a method on a ``DOM +Implementation'' object. You can get this object either by calling +the \function{getDOMImplementation()} function in the +\refmodule{xml.dom} package or the \module{xml.dom.minidom} module. +Using the implementation from the \module{xml.dom.minidom} module will +always return a \class{Document} instance from the minidom +implementation, while the version from \refmodule{xml.dom} may provide +an alternate implementation (this is likely if you have the +\ulink{PyXML package}{http://pyxml.sourceforge.net/} installed). Once +you have a \class{Document}, you can add child nodes to it to populate +the DOM: + +\begin{verbatim} +from xml.dom.minidom import getDOMImplementation + +impl = getDOMImplementation() + +newdoc = impl.createDocument(None, "some_tag", None) +top_element = newdoc.documentElement +text = newdoc.createTextNode('Some textual content.') +top_element.appendChild(text) +\end{verbatim} + +Once you have a DOM document object, you can access the parts of your +XML document through its properties and methods. These properties are +defined in the DOM specification. The main property of the document +object is the \member{documentElement} property. It gives you the +main element in the XML document: the one that holds all others. Here +is an example program: + +\begin{verbatim} +dom3 = parseString("<myxml>Some data</myxml>") +assert dom3.documentElement.tagName == "myxml" +\end{verbatim} + +When you are finished with a DOM, you should clean it up. This is +necessary because some versions of Python do not support garbage +collection of objects that refer to each other in a cycle. Until this +restriction is removed from all versions of Python, it is safest to +write your code as if cycles would not be cleaned up. + +The way to clean up a DOM is to call its \method{unlink()} method: + +\begin{verbatim} +dom1.unlink() +dom2.unlink() +dom3.unlink() +\end{verbatim} + +\method{unlink()} is a \module{xml.dom.minidom}-specific extension to +the DOM API. After calling \method{unlink()} on a node, the node and +its descendants are essentially useless. + +\begin{seealso} + \seetitle[http://www.w3.org/TR/REC-DOM-Level-1/]{Document Object + Model (DOM) Level 1 Specification} + {The W3C recommendation for the + DOM supported by \module{xml.dom.minidom}.} +\end{seealso} + + +\subsection{DOM Objects \label{dom-objects}} + +The definition of the DOM API for Python is given as part of the +\refmodule{xml.dom} module documentation. This section lists the +differences between the API and \refmodule{xml.dom.minidom}. + + +\begin{methoddesc}[Node]{unlink}{} +Break internal references within the DOM so that it will be garbage +collected on versions of Python without cyclic GC. Even when cyclic +GC is available, using this can make large amounts of memory available +sooner, so calling this on DOM objects as soon as they are no longer +needed is good practice. This only needs to be called on the +\class{Document} object, but may be called on child nodes to discard +children of that node. +\end{methoddesc} + +\begin{methoddesc}[Node]{writexml}{writer\optional{,indent=""\optional{,addindent=""\optional{,newl=""}}}} +Write XML to the writer object. The writer should have a +\method{write()} method which matches that of the file object +interface. The \var{indent} parameter is the indentation of the current +node. The \var{addindent} parameter is the incremental indentation to use +for subnodes of the current one. The \var{newl} parameter specifies the +string to use to terminate newlines. + +\versionchanged[The optional keyword parameters +\var{indent}, \var{addindent}, and \var{newl} were added to support pretty +output]{2.1} + +\versionchanged[For the \class{Document} node, an additional keyword +argument \var{encoding} can be used to specify the encoding field of the XML +header]{2.3} +\end{methoddesc} + +\begin{methoddesc}[Node]{toxml}{\optional{encoding}} +Return the XML that the DOM represents as a string. + +With no argument, the XML header does not specify an encoding, and the +result is Unicode string if the default encoding cannot represent all +characters in the document. Encoding this string in an encoding other +than UTF-8 is likely incorrect, since UTF-8 is the default encoding of +XML. + +With an explicit \var{encoding} argument, the result is a byte string +in the specified encoding. It is recommended that this argument is +always specified. To avoid \exception{UnicodeError} exceptions in case of +unrepresentable text data, the encoding argument should be specified +as "utf-8". + +\versionchanged[the \var{encoding} argument was introduced]{2.3} +\end{methoddesc} + +\begin{methoddesc}[Node]{toprettyxml}{\optional{indent\optional{, newl}}} +Return a pretty-printed version of the document. \var{indent} specifies +the indentation string and defaults to a tabulator; \var{newl} specifies +the string emitted at the end of each line and defaults to \code{\e n}. + +\versionadded{2.1} +\versionchanged[the encoding argument; see \method{toxml()}]{2.3} +\end{methoddesc} + +The following standard DOM methods have special considerations with +\refmodule{xml.dom.minidom}: + +\begin{methoddesc}[Node]{cloneNode}{deep} +Although this method was present in the version of +\refmodule{xml.dom.minidom} packaged with Python 2.0, it was seriously +broken. This has been corrected for subsequent releases. +\end{methoddesc} + + +\subsection{DOM Example \label{dom-example}} + +This example program is a fairly realistic example of a simple +program. In this particular case, we do not take much advantage +of the flexibility of the DOM. + +\verbatiminput{minidom-example.py} + + +\subsection{minidom and the DOM standard \label{minidom-and-dom}} + +The \refmodule{xml.dom.minidom} module is essentially a DOM +1.0-compatible DOM with some DOM 2 features (primarily namespace +features). + +Usage of the DOM interface in Python is straight-forward. The +following mapping rules apply: + +\begin{itemize} +\item Interfaces are accessed through instance objects. Applications + should not instantiate the classes themselves; they should use + the creator functions available on the \class{Document} object. + Derived interfaces support all operations (and attributes) from + the base interfaces, plus any new operations. + +\item Operations are used as methods. Since the DOM uses only + \keyword{in} parameters, the arguments are passed in normal + order (from left to right). There are no optional + arguments. \keyword{void} operations return \code{None}. + +\item IDL attributes map to instance attributes. For compatibility + with the OMG IDL language mapping for Python, an attribute + \code{foo} can also be accessed through accessor methods + \method{_get_foo()} and \method{_set_foo()}. \keyword{readonly} + attributes must not be changed; this is not enforced at + runtime. + +\item The types \code{short int}, \code{unsigned int}, \code{unsigned + long long}, and \code{boolean} all map to Python integer + objects. + +\item The type \code{DOMString} maps to Python strings. + \refmodule{xml.dom.minidom} supports either byte or Unicode + strings, but will normally produce Unicode strings. Values + of type \code{DOMString} may also be \code{None} where allowed + to have the IDL \code{null} value by the DOM specification from + the W3C. + +\item \keyword{const} declarations map to variables in their + respective scope + (e.g. \code{xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE}); + they must not be changed. + +\item \code{DOMException} is currently not supported in + \refmodule{xml.dom.minidom}. Instead, + \refmodule{xml.dom.minidom} uses standard Python exceptions such + as \exception{TypeError} and \exception{AttributeError}. + +\item \class{NodeList} objects are implemented using Python's built-in + list type. Starting with Python 2.2, these objects provide the + interface defined in the DOM specification, but with earlier + versions of Python they do not support the official API. They + are, however, much more ``Pythonic'' than the interface defined + in the W3C recommendations. +\end{itemize} + + +The following interfaces have no implementation in +\refmodule{xml.dom.minidom}: + +\begin{itemize} +\item \class{DOMTimeStamp} + +\item \class{DocumentType} (added in Python 2.1) + +\item \class{DOMImplementation} (added in Python 2.1) + +\item \class{CharacterData} + +\item \class{CDATASection} + +\item \class{Notation} + +\item \class{Entity} + +\item \class{EntityReference} + +\item \class{DocumentFragment} +\end{itemize} + +Most of these reflect information in the XML document that is not of +general utility to most DOM users. |