summaryrefslogtreecommitdiff
path: root/sys/man/1/uhtml
diff options
context:
space:
mode:
authorcinap_lenrek <cinap_lenrek@centraldogma>2011-09-24 17:06:45 +0200
committercinap_lenrek <cinap_lenrek@centraldogma>2011-09-24 17:06:45 +0200
commit13304b7b967c6172cfaa6b31dd4f92348056ed1a (patch)
tree4c0e56aa2313735a847f529366dee45ee6110a5d /sys/man/1/uhtml
parent6d6880cec936a13e67e43357538394a5c7f09010 (diff)
html2ms, tcs, mothra, uhtml: threat &apos; as special entity, add uhtml(1)
Diffstat (limited to 'sys/man/1/uhtml')
-rw-r--r--sys/man/1/uhtml46
1 files changed, 46 insertions, 0 deletions
diff --git a/sys/man/1/uhtml b/sys/man/1/uhtml
new file mode 100644
index 000000000..5e91a5608
--- /dev/null
+++ b/sys/man/1/uhtml
@@ -0,0 +1,46 @@
+.TH UHTML 1
+.SH NAME
+uhtml \- convert foreign character set HTML file to unicode
+.SH SYNOPSIS
+.B uhtml
+[
+.B -p
+] [
+.B -c
+.I charset
+] [
+.I file
+]
+.SH DESCRIPTION
+HTML comes in various character set encodings
+and has special forms to encode characters. To
+make it easier to process html, uthml is used
+to normalize it to a unicode only form.
+.LP
+Uhtml detects the character set of the html input
+.I file
+and calls
+.IR tcs (1)
+to convert it to utf replacing html-entity forms
+by ther unicode character representations except for
+.B lt
+.B gt
+.B amp
+.B quot
+and
+.B apos .
+The converted html is written to
+standard output. If no
+.I file
+was given, it is read from standard input. If the
+.B -p
+option is given, the detected character set is printed and
+the program exits without conversion.
+In case character set detection fails, the default (utf)
+is assumed. This default can be changed with the
+.B -c
+option.
+.SH SOURCE
+.B /sys/src/cmd/uhtml.c
+.SH SEE ALSO
+.IR tcs (1)