diff options
author | cinap_lenrek <cinap_lenrek@centraldogma> | 2011-09-24 17:06:45 +0200 |
---|---|---|
committer | cinap_lenrek <cinap_lenrek@centraldogma> | 2011-09-24 17:06:45 +0200 |
commit | 13304b7b967c6172cfaa6b31dd4f92348056ed1a (patch) | |
tree | 4c0e56aa2313735a847f529366dee45ee6110a5d /sys/man/1/uhtml | |
parent | 6d6880cec936a13e67e43357538394a5c7f09010 (diff) |
html2ms, tcs, mothra, uhtml: threat ' as special entity, add uhtml(1)
Diffstat (limited to 'sys/man/1/uhtml')
-rw-r--r-- | sys/man/1/uhtml | 46 |
1 files changed, 46 insertions, 0 deletions
diff --git a/sys/man/1/uhtml b/sys/man/1/uhtml new file mode 100644 index 000000000..5e91a5608 --- /dev/null +++ b/sys/man/1/uhtml @@ -0,0 +1,46 @@ +.TH UHTML 1 +.SH NAME +uhtml \- convert foreign character set HTML file to unicode +.SH SYNOPSIS +.B uhtml +[ +.B -p +] [ +.B -c +.I charset +] [ +.I file +] +.SH DESCRIPTION +HTML comes in various character set encodings +and has special forms to encode characters. To +make it easier to process html, uthml is used +to normalize it to a unicode only form. +.LP +Uhtml detects the character set of the html input +.I file +and calls +.IR tcs (1) +to convert it to utf replacing html-entity forms +by ther unicode character representations except for +.B lt +.B gt +.B amp +.B quot +and +.B apos . +The converted html is written to +standard output. If no +.I file +was given, it is read from standard input. If the +.B -p +option is given, the detected character set is printed and +the program exits without conversion. +In case character set detection fails, the default (utf) +is assumed. This default can be changed with the +.B -c +option. +.SH SOURCE +.B /sys/src/cmd/uhtml.c +.SH SEE ALSO +.IR tcs (1) |