summaryrefslogtreecommitdiff
path: root/sys/man/1/doc2txt
diff options
context:
space:
mode:
authorTaru Karttunen <taruti@taruti.net>2011-03-30 16:49:47 +0300
committerTaru Karttunen <taruti@taruti.net>2011-03-30 16:49:47 +0300
commitb41b9034225ab3e49980d9de55c141011b6383b0 (patch)
tree891014b4c2e803e01ac7a1fd2b60819fbc5a6e73 /sys/man/1/doc2txt
parentc558a99e0be506a9abdf677f0ca4490644e05fc1 (diff)
Import sources from 2011-03-30 iso image - sys/man
Diffstat (limited to 'sys/man/1/doc2txt')
-rwxr-xr-xsys/man/1/doc2txt161
1 files changed, 161 insertions, 0 deletions
diff --git a/sys/man/1/doc2txt b/sys/man/1/doc2txt
new file mode 100755
index 000000000..9b85e9f8f
--- /dev/null
+++ b/sys/man/1/doc2txt
@@ -0,0 +1,161 @@
+.TH DOC2TXT 1
+.SH NAME
+doc2txt, doc2ps, wdoc2txt, xls2txt, olefs, mswordstrings, msexceltables
+\- extract printable text from Microsoft documents
+.SH SYNOPSIS
+.B doc2txt
+[
+.I file.doc
+]
+.br
+.B doc2ps
+[
+.I file.doc
+]
+.br
+.B wdoc2txt
+[
+.I file.doc
+]
+.br
+.B xls2txt
+[
+.I file.xls
+]
+.br
+.B aux/olefs
+[
+.B -m
+.I mtpt
+]
+.I file.doc
+.br
+.B aux/mswordstrings
+.IB mtpt /WordDocument
+.br
+.B aux/msexceltables
+[
+.B -qaDnt
+] [
+.B -d
+.I delim
+] [
+.B -c
+.I column-range
+] [
+.B -w
+.I worksheet-range
+]
+.IB mtpt /Workbook
+.SH DESCRIPTION
+.I Doc2txt
+is an
+.IR rc (1)
+script that uses
+.I olefs
+and
+.I mswordstrings
+to extract the printable text from the body of a Microsoft Word document
+and write it on the standard output.
+.I Doc2ps
+is similar, but emits PostScript corresponding to the document.
+.I Wdoc2txt
+is similar to
+.IR doc2txt ,
+but uses
+.IR plumb (1)
+to send the output to a new
+.IR acme (1)
+window instead.
+.I Xls2txt
+performs a similar function for Microsoft Excel documents.
+.PP
+Microsoft Office documents are stored in OLE (Object Linking and Embedding)
+format, which is a scaled down version of Microsoft's FAT file system.
+.I Olefs
+presents the contents of an MS Office document as a file system
+on
+.IR mtpt ,
+which defaults to
+.BR /mnt/doc .
+.I Mswordstrings
+or
+.I msexceltables
+may then be used to parse the files inside, extracting
+a text stream.
+.I Msexceltables
+may be given options to control the formatting of its output.
+.TF "\fL-d \fIdelim"
+.TP
+.B -a
+Attempt conversion of non-tabular sheets in the workbook (charts).
+.TP
+.BI -d " delim
+Sets the inter-field delimiter to the string
+.IR delim ,
+by default a single space.
+.TP
+.B -D
+Enables debugging output.
+.TP
+.BI -c " range
+.I Range
+is a comma-separated list of column numbers and ranges.
+Ranges are separated by dashes.
+Limit processing to just those columns named;
+by default all columns are output.
+.TP
+.B -n
+Disables field padding to column width.
+.TP
+.B -q
+Disable quoting of textural fields (see
+.IR quote (2).)
+.TP
+.B -t
+Truncate fields to the column width.
+.TP
+.BI -w " range
+.I Range
+is a comma-separated list of worksheet numbers and ranges, this
+limits the sheets output using the same syntax as the
+.B -c
+option above.
+Suppressed chart pages are always included in the sheet count.
+.SH EXAMPLE
+Extract pieces of an MS Excel spreadsheet.
+.PD 0
+.IP
+.EX
+.SM
+aux/olefs report.xls
+msexceltables -q -w 1,7,9-14 -c 3-5 -n -d '@' /mnt/doc/Workbook > rpt.txt
+unmount /mnt/doc
+.EE
+.PD
+.SH SOURCE
+.TF "\fL/sys/src/cmd/aux "
+.TP
+.B /rc/bin
+.BR doc2txt ,
+.BR doc2ps ,
+.BR wdoc2txt,
+and
+.BR xls2txt
+.TP
+.B /sys/src/cmd/aux
+the others
+.fi
+.PD
+.SH SEE ALSO
+.IR strings (1)
+.br
+``Microsoft Word 97 Binary File Format'',
+at Microsoft's developer (MSDN) home page.
+.br
+``LAOLA Binary Structures'',
+.B http://user.cs.tu-berlin.de/~schwartz/pmh
+.br
+``OpenOffice.Org's Excel Documentation'',
+.br
+.B http://sc.openoffice.org/excelfileformat.pdf