diff options
author | cinap_lenrek <cinap_lenrek@localhost> | 2011-05-03 11:25:13 +0000 |
---|---|---|
committer | cinap_lenrek <cinap_lenrek@localhost> | 2011-05-03 11:25:13 +0000 |
commit | 458120dd40db6b4df55a4e96b650e16798ef06a0 (patch) | |
tree | 8f82685be24fef97e715c6f5ca4c68d34d5074ee /sys/src/cmd/python/Doc/lib/libdifflib.tex | |
parent | 3a742c699f6806c1145aea5149bf15de15a0afd7 (diff) |
add hg and python
Diffstat (limited to 'sys/src/cmd/python/Doc/lib/libdifflib.tex')
-rw-r--r-- | sys/src/cmd/python/Doc/lib/libdifflib.tex | 704 |
1 files changed, 704 insertions, 0 deletions
diff --git a/sys/src/cmd/python/Doc/lib/libdifflib.tex b/sys/src/cmd/python/Doc/lib/libdifflib.tex new file mode 100644 index 000000000..acb5ed1c3 --- /dev/null +++ b/sys/src/cmd/python/Doc/lib/libdifflib.tex @@ -0,0 +1,704 @@ +\section{\module{difflib} --- + Helpers for computing deltas} + +\declaremodule{standard}{difflib} +\modulesynopsis{Helpers for computing differences between objects.} +\moduleauthor{Tim Peters}{tim_one@users.sourceforge.net} +\sectionauthor{Tim Peters}{tim_one@users.sourceforge.net} +% LaTeXification by Fred L. Drake, Jr. <fdrake@acm.org>. + +\versionadded{2.1} + + +\begin{classdesc*}{SequenceMatcher} + This is a flexible class for comparing pairs of sequences of any + type, so long as the sequence elements are hashable. The basic + algorithm predates, and is a little fancier than, an algorithm + published in the late 1980's by Ratcliff and Obershelp under the + hyperbolic name ``gestalt pattern matching.'' The idea is to find + the longest contiguous matching subsequence that contains no + ``junk'' elements (the Ratcliff and Obershelp algorithm doesn't + address junk). The same idea is then applied recursively to the + pieces of the sequences to the left and to the right of the matching + subsequence. This does not yield minimal edit sequences, but does + tend to yield matches that ``look right'' to people. + + \strong{Timing:} The basic Ratcliff-Obershelp algorithm is cubic + time in the worst case and quadratic time in the expected case. + \class{SequenceMatcher} is quadratic time for the worst case and has + expected-case behavior dependent in a complicated way on how many + elements the sequences have in common; best case time is linear. +\end{classdesc*} + +\begin{classdesc*}{Differ} + This is a class for comparing sequences of lines of text, and + producing human-readable differences or deltas. Differ uses + \class{SequenceMatcher} both to compare sequences of lines, and to + compare sequences of characters within similar (near-matching) + lines. + + Each line of a \class{Differ} delta begins with a two-letter code: + +\begin{tableii}{l|l}{code}{Code}{Meaning} + \lineii{'- '}{line unique to sequence 1} + \lineii{'+ '}{line unique to sequence 2} + \lineii{' '}{line common to both sequences} + \lineii{'? '}{line not present in either input sequence} +\end{tableii} + + Lines beginning with `\code{?~}' attempt to guide the eye to + intraline differences, and were not present in either input + sequence. These lines can be confusing if the sequences contain tab + characters. +\end{classdesc*} + +\begin{classdesc*}{HtmlDiff} + + This class can be used to create an HTML table (or a complete HTML file + containing the table) showing a side by side, line by line comparison + of text with inter-line and intra-line change highlights. The table can + be generated in either full or contextual difference mode. + + The constructor for this class is: + + \begin{funcdesc}{__init__}{\optional{tabsize}\optional{, + wrapcolumn}\optional{, linejunk}\optional{, charjunk}} + + Initializes instance of \class{HtmlDiff}. + + \var{tabsize} is an optional keyword argument to specify tab stop spacing + and defaults to \code{8}. + + \var{wrapcolumn} is an optional keyword to specify column number where + lines are broken and wrapped, defaults to \code{None} where lines are not + wrapped. + + \var{linejunk} and \var{charjunk} are optional keyword arguments passed + into \code{ndiff()} (used by \class{HtmlDiff} to generate the + side by side HTML differences). See \code{ndiff()} documentation for + argument default values and descriptions. + + \end{funcdesc} + + The following methods are public: + + \begin{funcdesc}{make_file}{fromlines, tolines + \optional{, fromdesc}\optional{, todesc}\optional{, context}\optional{, + numlines}} + Compares \var{fromlines} and \var{tolines} (lists of strings) and returns + a string which is a complete HTML file containing a table showing line by + line differences with inter-line and intra-line changes highlighted. + + \var{fromdesc} and \var{todesc} are optional keyword arguments to specify + from/to file column header strings (both default to an empty string). + + \var{context} and \var{numlines} are both optional keyword arguments. + Set \var{context} to \code{True} when contextual differences are to be + shown, else the default is \code{False} to show the full files. + \var{numlines} defaults to \code{5}. When \var{context} is \code{True} + \var{numlines} controls the number of context lines which surround the + difference highlights. When \var{context} is \code{False} \var{numlines} + controls the number of lines which are shown before a difference + highlight when using the "next" hyperlinks (setting to zero would cause + the "next" hyperlinks to place the next difference highlight at the top of + the browser without any leading context). + \end{funcdesc} + + \begin{funcdesc}{make_table}{fromlines, tolines + \optional{, fromdesc}\optional{, todesc}\optional{, context}\optional{, + numlines}} + Compares \var{fromlines} and \var{tolines} (lists of strings) and returns + a string which is a complete HTML table showing line by line differences + with inter-line and intra-line changes highlighted. + + The arguments for this method are the same as those for the + \method{make_file()} method. + \end{funcdesc} + + \file{Tools/scripts/diff.py} is a command-line front-end to this class + and contains a good example of its use. + + \versionadded{2.4} +\end{classdesc*} + +\begin{funcdesc}{context_diff}{a, b\optional{, fromfile}\optional{, + tofile}\optional{, fromfiledate}\optional{, tofiledate}\optional{, + n}\optional{, lineterm}} + Compare \var{a} and \var{b} (lists of strings); return a + delta (a generator generating the delta lines) in context diff + format. + + Context diffs are a compact way of showing just the lines that have + changed plus a few lines of context. The changes are shown in a + before/after style. The number of context lines is set by \var{n} + which defaults to three. + + By default, the diff control lines (those with \code{***} or \code{---}) + are created with a trailing newline. This is helpful so that inputs created + from \function{file.readlines()} result in diffs that are suitable for use + with \function{file.writelines()} since both the inputs and outputs have + trailing newlines. + + For inputs that do not have trailing newlines, set the \var{lineterm} + argument to \code{""} so that the output will be uniformly newline free. + + The context diff format normally has a header for filenames and + modification times. Any or all of these may be specified using strings for + \var{fromfile}, \var{tofile}, \var{fromfiledate}, and \var{tofiledate}. + The modification times are normally expressed in the format returned by + \function{time.ctime()}. If not specified, the strings default to blanks. + + \file{Tools/scripts/diff.py} is a command-line front-end for this + function. + + \versionadded{2.3} +\end{funcdesc} + +\begin{funcdesc}{get_close_matches}{word, possibilities\optional{, + n}\optional{, cutoff}} + Return a list of the best ``good enough'' matches. \var{word} is a + sequence for which close matches are desired (typically a string), + and \var{possibilities} is a list of sequences against which to + match \var{word} (typically a list of strings). + + Optional argument \var{n} (default \code{3}) is the maximum number + of close matches to return; \var{n} must be greater than \code{0}. + + Optional argument \var{cutoff} (default \code{0.6}) is a float in + the range [0, 1]. Possibilities that don't score at least that + similar to \var{word} are ignored. + + The best (no more than \var{n}) matches among the possibilities are + returned in a list, sorted by similarity score, most similar first. + +\begin{verbatim} +>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy']) +['apple', 'ape'] +>>> import keyword +>>> get_close_matches('wheel', keyword.kwlist) +['while'] +>>> get_close_matches('apple', keyword.kwlist) +[] +>>> get_close_matches('accept', keyword.kwlist) +['except'] +\end{verbatim} +\end{funcdesc} + +\begin{funcdesc}{ndiff}{a, b\optional{, linejunk}\optional{, charjunk}} + Compare \var{a} and \var{b} (lists of strings); return a + \class{Differ}-style delta (a generator generating the delta lines). + + Optional keyword parameters \var{linejunk} and \var{charjunk} are + for filter functions (or \code{None}): + + \var{linejunk}: A function that accepts a single string + argument, and returns true if the string is junk, or false if not. + The default is (\code{None}), starting with Python 2.3. Before then, + the default was the module-level function + \function{IS_LINE_JUNK()}, which filters out lines without visible + characters, except for at most one pound character (\character{\#}). + As of Python 2.3, the underlying \class{SequenceMatcher} class + does a dynamic analysis of which lines are so frequent as to + constitute noise, and this usually works better than the pre-2.3 + default. + + \var{charjunk}: A function that accepts a character (a string of + length 1), and returns if the character is junk, or false if not. + The default is module-level function \function{IS_CHARACTER_JUNK()}, + which filters out whitespace characters (a blank or tab; note: bad + idea to include newline in this!). + + \file{Tools/scripts/ndiff.py} is a command-line front-end to this + function. + +\begin{verbatim} +>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1), +... 'ore\ntree\nemu\n'.splitlines(1)) +>>> print ''.join(diff), +- one +? ^ ++ ore +? ^ +- two +- three +? - ++ tree ++ emu +\end{verbatim} +\end{funcdesc} + +\begin{funcdesc}{restore}{sequence, which} + Return one of the two sequences that generated a delta. + + Given a \var{sequence} produced by \method{Differ.compare()} or + \function{ndiff()}, extract lines originating from file 1 or 2 + (parameter \var{which}), stripping off line prefixes. + + Example: + +\begin{verbatim} +>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1), +... 'ore\ntree\nemu\n'.splitlines(1)) +>>> diff = list(diff) # materialize the generated delta into a list +>>> print ''.join(restore(diff, 1)), +one +two +three +>>> print ''.join(restore(diff, 2)), +ore +tree +emu +\end{verbatim} + +\end{funcdesc} + +\begin{funcdesc}{unified_diff}{a, b\optional{, fromfile}\optional{, + tofile}\optional{, fromfiledate}\optional{, tofiledate}\optional{, + n}\optional{, lineterm}} + Compare \var{a} and \var{b} (lists of strings); return a + delta (a generator generating the delta lines) in unified diff + format. + + Unified diffs are a compact way of showing just the lines that have + changed plus a few lines of context. The changes are shown in a + inline style (instead of separate before/after blocks). The number + of context lines is set by \var{n} which defaults to three. + + By default, the diff control lines (those with \code{---}, \code{+++}, + or \code{@@}) are created with a trailing newline. This is helpful so + that inputs created from \function{file.readlines()} result in diffs + that are suitable for use with \function{file.writelines()} since both + the inputs and outputs have trailing newlines. + + For inputs that do not have trailing newlines, set the \var{lineterm} + argument to \code{""} so that the output will be uniformly newline free. + + The context diff format normally has a header for filenames and + modification times. Any or all of these may be specified using strings for + \var{fromfile}, \var{tofile}, \var{fromfiledate}, and \var{tofiledate}. + The modification times are normally expressed in the format returned by + \function{time.ctime()}. If not specified, the strings default to blanks. + + \file{Tools/scripts/diff.py} is a command-line front-end for this + function. + + \versionadded{2.3} +\end{funcdesc} + +\begin{funcdesc}{IS_LINE_JUNK}{line} + Return true for ignorable lines. The line \var{line} is ignorable + if \var{line} is blank or contains a single \character{\#}, + otherwise it is not ignorable. Used as a default for parameter + \var{linejunk} in \function{ndiff()} before Python 2.3. +\end{funcdesc} + + +\begin{funcdesc}{IS_CHARACTER_JUNK}{ch} + Return true for ignorable characters. The character \var{ch} is + ignorable if \var{ch} is a space or tab, otherwise it is not + ignorable. Used as a default for parameter \var{charjunk} in + \function{ndiff()}. +\end{funcdesc} + + +\begin{seealso} + \seetitle[http://www.ddj.com/documents/s=1103/ddj8807c/] + {Pattern Matching: The Gestalt Approach}{Discussion of a + similar algorithm by John W. Ratcliff and D. E. Metzener. + This was published in + \citetitle[http://www.ddj.com/]{Dr. Dobb's Journal} in + July, 1988.} +\end{seealso} + + +\subsection{SequenceMatcher Objects \label{sequence-matcher}} + +The \class{SequenceMatcher} class has this constructor: + +\begin{classdesc}{SequenceMatcher}{\optional{isjunk\optional{, + a\optional{, b}}}} + Optional argument \var{isjunk} must be \code{None} (the default) or + a one-argument function that takes a sequence element and returns + true if and only if the element is ``junk'' and should be ignored. + Passing \code{None} for \var{isjunk} is equivalent to passing + \code{lambda x: 0}; in other words, no elements are ignored. For + example, pass: + +\begin{verbatim} +lambda x: x in " \t" +\end{verbatim} + + if you're comparing lines as sequences of characters, and don't want + to synch up on blanks or hard tabs. + + The optional arguments \var{a} and \var{b} are sequences to be + compared; both default to empty strings. The elements of both + sequences must be hashable. +\end{classdesc} + + +\class{SequenceMatcher} objects have the following methods: + +\begin{methoddesc}{set_seqs}{a, b} + Set the two sequences to be compared. +\end{methoddesc} + +\class{SequenceMatcher} computes and caches detailed information about +the second sequence, so if you want to compare one sequence against +many sequences, use \method{set_seq2()} to set the commonly used +sequence once and call \method{set_seq1()} repeatedly, once for each +of the other sequences. + +\begin{methoddesc}{set_seq1}{a} + Set the first sequence to be compared. The second sequence to be + compared is not changed. +\end{methoddesc} + +\begin{methoddesc}{set_seq2}{b} + Set the second sequence to be compared. The first sequence to be + compared is not changed. +\end{methoddesc} + +\begin{methoddesc}{find_longest_match}{alo, ahi, blo, bhi} + Find longest matching block in \code{\var{a}[\var{alo}:\var{ahi}]} + and \code{\var{b}[\var{blo}:\var{bhi}]}. + + If \var{isjunk} was omitted or \code{None}, + \method{get_longest_match()} returns \code{(\var{i}, \var{j}, + \var{k})} such that \code{\var{a}[\var{i}:\var{i}+\var{k}]} is equal + to \code{\var{b}[\var{j}:\var{j}+\var{k}]}, where + \code{\var{alo} <= \var{i} <= \var{i}+\var{k} <= \var{ahi}} and + \code{\var{blo} <= \var{j} <= \var{j}+\var{k} <= \var{bhi}}. + For all \code{(\var{i'}, \var{j'}, \var{k'})} meeting those + conditions, the additional conditions + \code{\var{k} >= \var{k'}}, + \code{\var{i} <= \var{i'}}, + and if \code{\var{i} == \var{i'}}, \code{\var{j} <= \var{j'}} + are also met. + In other words, of all maximal matching blocks, return one that + starts earliest in \var{a}, and of all those maximal matching blocks + that start earliest in \var{a}, return the one that starts earliest + in \var{b}. + +\begin{verbatim} +>>> s = SequenceMatcher(None, " abcd", "abcd abcd") +>>> s.find_longest_match(0, 5, 0, 9) +(0, 4, 5) +\end{verbatim} + + If \var{isjunk} was provided, first the longest matching block is + determined as above, but with the additional restriction that no + junk element appears in the block. Then that block is extended as + far as possible by matching (only) junk elements on both sides. + So the resulting block never matches on junk except as identical + junk happens to be adjacent to an interesting match. + + Here's the same example as before, but considering blanks to be junk. + That prevents \code{' abcd'} from matching the \code{' abcd'} at the + tail end of the second sequence directly. Instead only the + \code{'abcd'} can match, and matches the leftmost \code{'abcd'} in + the second sequence: + +\begin{verbatim} +>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd") +>>> s.find_longest_match(0, 5, 0, 9) +(1, 0, 4) +\end{verbatim} + + If no blocks match, this returns \code{(\var{alo}, \var{blo}, 0)}. +\end{methoddesc} + +\begin{methoddesc}{get_matching_blocks}{} + Return list of triples describing matching subsequences. + Each triple is of the form \code{(\var{i}, \var{j}, \var{n})}, and + means that \code{\var{a}[\var{i}:\var{i}+\var{n}] == + \var{b}[\var{j}:\var{j}+\var{n}]}. The triples are monotonically + increasing in \var{i} and \var{j}. + + The last triple is a dummy, and has the value \code{(len(\var{a}), + len(\var{b}), 0)}. It is the only triple with \code{\var{n} == 0}. + % Explain why a dummy is used! + + If + \code{(\var{i}, \var{j}, \var{n})} and + \code{(\var{i'}, \var{j'}, \var{n'})} are adjacent triples in the list, + and the second is not the last triple in the list, then + \code{\var{i}+\var{n} != \var{i'}} or + \code{\var{j}+\var{n} != \var{j'}}; in other words, adjacent triples + always describe non-adjacent equal blocks. + \versionchanged[The guarantee that adjacent triples always describe + non-adjacent blocks was implemented]{2.5} + +\begin{verbatim} +>>> s = SequenceMatcher(None, "abxcd", "abcd") +>>> s.get_matching_blocks() +[(0, 0, 2), (3, 2, 2), (5, 4, 0)] +\end{verbatim} +\end{methoddesc} + +\begin{methoddesc}{get_opcodes}{} + Return list of 5-tuples describing how to turn \var{a} into \var{b}. + Each tuple is of the form \code{(\var{tag}, \var{i1}, \var{i2}, + \var{j1}, \var{j2})}. The first tuple has \code{\var{i1} == + \var{j1} == 0}, and remaining tuples have \var{i1} equal to the + \var{i2} from the preceding tuple, and, likewise, \var{j1} equal to + the previous \var{j2}. + + The \var{tag} values are strings, with these meanings: + +\begin{tableii}{l|l}{code}{Value}{Meaning} + \lineii{'replace'}{\code{\var{a}[\var{i1}:\var{i2}]} should be + replaced by \code{\var{b}[\var{j1}:\var{j2}]}.} + \lineii{'delete'}{\code{\var{a}[\var{i1}:\var{i2}]} should be + deleted. Note that \code{\var{j1} == \var{j2}} in + this case.} + \lineii{'insert'}{\code{\var{b}[\var{j1}:\var{j2}]} should be + inserted at \code{\var{a}[\var{i1}:\var{i1}]}. + Note that \code{\var{i1} == \var{i2}} in this + case.} + \lineii{'equal'}{\code{\var{a}[\var{i1}:\var{i2}] == + \var{b}[\var{j1}:\var{j2}]} (the sub-sequences are + equal).} +\end{tableii} + +For example: + +\begin{verbatim} +>>> a = "qabxcd" +>>> b = "abycdf" +>>> s = SequenceMatcher(None, a, b) +>>> for tag, i1, i2, j1, j2 in s.get_opcodes(): +... print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" % +... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2])) + delete a[0:1] (q) b[0:0] () + equal a[1:3] (ab) b[0:2] (ab) +replace a[3:4] (x) b[2:3] (y) + equal a[4:6] (cd) b[3:5] (cd) + insert a[6:6] () b[5:6] (f) +\end{verbatim} +\end{methoddesc} + +\begin{methoddesc}{get_grouped_opcodes}{\optional{n}} + Return a generator of groups with up to \var{n} lines of context. + + Starting with the groups returned by \method{get_opcodes()}, + this method splits out smaller change clusters and eliminates + intervening ranges which have no changes. + + The groups are returned in the same format as \method{get_opcodes()}. + \versionadded{2.3} +\end{methoddesc} + +\begin{methoddesc}{ratio}{} + Return a measure of the sequences' similarity as a float in the + range [0, 1]. + + Where T is the total number of elements in both sequences, and M is + the number of matches, this is 2.0*M / T. Note that this is + \code{1.0} if the sequences are identical, and \code{0.0} if they + have nothing in common. + + This is expensive to compute if \method{get_matching_blocks()} or + \method{get_opcodes()} hasn't already been called, in which case you + may want to try \method{quick_ratio()} or + \method{real_quick_ratio()} first to get an upper bound. +\end{methoddesc} + +\begin{methoddesc}{quick_ratio}{} + Return an upper bound on \method{ratio()} relatively quickly. + + This isn't defined beyond that it is an upper bound on + \method{ratio()}, and is faster to compute. +\end{methoddesc} + +\begin{methoddesc}{real_quick_ratio}{} + Return an upper bound on \method{ratio()} very quickly. + + This isn't defined beyond that it is an upper bound on + \method{ratio()}, and is faster to compute than either + \method{ratio()} or \method{quick_ratio()}. +\end{methoddesc} + +The three methods that return the ratio of matching to total characters +can give different results due to differing levels of approximation, +although \method{quick_ratio()} and \method{real_quick_ratio()} are always +at least as large as \method{ratio()}: + +\begin{verbatim} +>>> s = SequenceMatcher(None, "abcd", "bcde") +>>> s.ratio() +0.75 +>>> s.quick_ratio() +0.75 +>>> s.real_quick_ratio() +1.0 +\end{verbatim} + + +\subsection{SequenceMatcher Examples \label{sequencematcher-examples}} + + +This example compares two strings, considering blanks to be ``junk:'' + +\begin{verbatim} +>>> s = SequenceMatcher(lambda x: x == " ", +... "private Thread currentThread;", +... "private volatile Thread currentThread;") +\end{verbatim} + +\method{ratio()} returns a float in [0, 1], measuring the similarity +of the sequences. As a rule of thumb, a \method{ratio()} value over +0.6 means the sequences are close matches: + +\begin{verbatim} +>>> print round(s.ratio(), 3) +0.866 +\end{verbatim} + +If you're only interested in where the sequences match, +\method{get_matching_blocks()} is handy: + +\begin{verbatim} +>>> for block in s.get_matching_blocks(): +... print "a[%d] and b[%d] match for %d elements" % block +a[0] and b[0] match for 8 elements +a[8] and b[17] match for 6 elements +a[14] and b[23] match for 15 elements +a[29] and b[38] match for 0 elements +\end{verbatim} + +Note that the last tuple returned by \method{get_matching_blocks()} is +always a dummy, \code{(len(\var{a}), len(\var{b}), 0)}, and this is +the only case in which the last tuple element (number of elements +matched) is \code{0}. + +If you want to know how to change the first sequence into the second, +use \method{get_opcodes()}: + +\begin{verbatim} +>>> for opcode in s.get_opcodes(): +... print "%6s a[%d:%d] b[%d:%d]" % opcode + equal a[0:8] b[0:8] +insert a[8:8] b[8:17] + equal a[8:14] b[17:23] + equal a[14:29] b[23:38] +\end{verbatim} + +See also the function \function{get_close_matches()} in this module, +which shows how simple code building on \class{SequenceMatcher} can be +used to do useful work. + + +\subsection{Differ Objects \label{differ-objects}} + +Note that \class{Differ}-generated deltas make no claim to be +\strong{minimal} diffs. To the contrary, minimal diffs are often +counter-intuitive, because they synch up anywhere possible, sometimes +accidental matches 100 pages apart. Restricting synch points to +contiguous matches preserves some notion of locality, at the +occasional cost of producing a longer diff. + +The \class{Differ} class has this constructor: + +\begin{classdesc}{Differ}{\optional{linejunk\optional{, charjunk}}} + Optional keyword parameters \var{linejunk} and \var{charjunk} are + for filter functions (or \code{None}): + + \var{linejunk}: A function that accepts a single string + argument, and returns true if the string is junk. The default is + \code{None}, meaning that no line is considered junk. + + \var{charjunk}: A function that accepts a single character argument + (a string of length 1), and returns true if the character is junk. + The default is \code{None}, meaning that no character is + considered junk. +\end{classdesc} + +\class{Differ} objects are used (deltas generated) via a single +method: + +\begin{methoddesc}{compare}{a, b} + Compare two sequences of lines, and generate the delta (a sequence + of lines). + + Each sequence must contain individual single-line strings ending + with newlines. Such sequences can be obtained from the + \method{readlines()} method of file-like objects. The delta generated + also consists of newline-terminated strings, ready to be printed as-is + via the \method{writelines()} method of a file-like object. +\end{methoddesc} + + +\subsection{Differ Example \label{differ-examples}} + +This example compares two texts. First we set up the texts, sequences +of individual single-line strings ending with newlines (such sequences +can also be obtained from the \method{readlines()} method of file-like +objects): + +\begin{verbatim} +>>> text1 = ''' 1. Beautiful is better than ugly. +... 2. Explicit is better than implicit. +... 3. Simple is better than complex. +... 4. Complex is better than complicated. +... '''.splitlines(1) +>>> len(text1) +4 +>>> text1[0][-1] +'\n' +>>> text2 = ''' 1. Beautiful is better than ugly. +... 3. Simple is better than complex. +... 4. Complicated is better than complex. +... 5. Flat is better than nested. +... '''.splitlines(1) +\end{verbatim} + +Next we instantiate a Differ object: + +\begin{verbatim} +>>> d = Differ() +\end{verbatim} + +Note that when instantiating a \class{Differ} object we may pass +functions to filter out line and character ``junk.'' See the +\method{Differ()} constructor for details. + +Finally, we compare the two: + +\begin{verbatim} +>>> result = list(d.compare(text1, text2)) +\end{verbatim} + +\code{result} is a list of strings, so let's pretty-print it: + +\begin{verbatim} +>>> from pprint import pprint +>>> pprint(result) +[' 1. Beautiful is better than ugly.\n', + '- 2. Explicit is better than implicit.\n', + '- 3. Simple is better than complex.\n', + '+ 3. Simple is better than complex.\n', + '? ++ \n', + '- 4. Complex is better than complicated.\n', + '? ^ ---- ^ \n', + '+ 4. Complicated is better than complex.\n', + '? ++++ ^ ^ \n', + '+ 5. Flat is better than nested.\n'] +\end{verbatim} + +As a single multi-line string it looks like this: + +\begin{verbatim} +>>> import sys +>>> sys.stdout.writelines(result) + 1. Beautiful is better than ugly. +- 2. Explicit is better than implicit. +- 3. Simple is better than complex. ++ 3. Simple is better than complex. +? ++ +- 4. Complex is better than complicated. +? ^ ---- ^ ++ 4. Complicated is better than complex. +? ++++ ^ ^ ++ 5. Flat is better than nested. +\end{verbatim} |