summaryrefslogtreecommitdiff
path: root/sys/man/1/tcs
diff options
context:
space:
mode:
authorTaru Karttunen <taruti@taruti.net>2011-03-30 16:49:47 +0300
committerTaru Karttunen <taruti@taruti.net>2011-03-30 16:49:47 +0300
commitb41b9034225ab3e49980d9de55c141011b6383b0 (patch)
tree891014b4c2e803e01ac7a1fd2b60819fbc5a6e73 /sys/man/1/tcs
parentc558a99e0be506a9abdf677f0ca4490644e05fc1 (diff)
Import sources from 2011-03-30 iso image - sys/man
Diffstat (limited to 'sys/man/1/tcs')
-rwxr-xr-xsys/man/1/tcs172
1 files changed, 172 insertions, 0 deletions
diff --git a/sys/man/1/tcs b/sys/man/1/tcs
new file mode 100755
index 000000000..664073d36
--- /dev/null
+++ b/sys/man/1/tcs
@@ -0,0 +1,172 @@
+.TH TCS 1
+.SH NAME
+tcs \- translate character sets
+.SH SYNOPSIS
+.B tcs
+[
+.B -slcv
+]
+[
+.B -f
+.I ics
+]
+[
+.B -t
+.I ocs
+]
+[
+.I file ...
+]
+.SH DESCRIPTION
+.I Tcs
+interprets the named
+.I file(s)
+(standard input default) as a stream of characters from the
+.I ics
+character set or format, converts them to runes,
+and then converts them into a stream of characters from the
+.I ocs
+character set or format on the standard output.
+The default value for
+.I ics
+and
+.I ocs
+is
+.BR utf ,
+the
+.SM UTF
+encoding described in
+.IR utf (6).
+The
+.B -l
+option lists the character sets known to
+.IR tcs .
+Processing continues in the face of conversion errors (the
+.B -s
+option prevents reporting of these errors).
+The
+.B -c
+option forces the output to contain only correctly converted characters;
+otherwise,
+.B Runeerror
+(0xFFFD)
+characters will be substituted for
+.SM UTF
+encoding errors and unknown characters.
+.PP
+The
+.B -v
+option generates various diagnostic and summary information on standard error,
+or makes the
+.B -l
+output more verbose.
+.PP
+.I Tcs
+recognizes an ever changing list of character sets.
+In particular, it supports a variety of Russian and Japanese encodings.
+Some of the supported encodings are
+.TF jis-kanji
+.TP
+.B utf
+The Plan 9
+.SM UTF
+encoding, known by ISO as UTF-8
+.TP
+.B utf1
+The deprecated original
+.SM UTF
+encoding from ISO 10646
+.TP
+.B ascii
+7-bit ASCII
+.TP
+.B 8859-1
+Latin-1 (Central European)
+.TP
+.B 8859-2
+Latin-2 (Czech .. Slovak)
+.TP
+.B 8859-3
+Latin-3 (Dutch .. Turkish)
+.TP
+.B 8859-4
+Latin-4 (Scandinavian)
+.TP
+.B 8859-5
+Part 5 (Cyrillic)
+.TP
+.B 8859-6
+Part 6 (Arabic)
+.TP
+.B 8859-7
+Part 7 (Greek)
+.TP
+.B 8859-8
+Part 8 (Hebrew)
+.TP
+.B 8859-9
+Latin-5 (Finnish .. Portuguese)
+.TP
+.B html
+Unicode as encoded by HTML
+.TP
+.B koi8
+KOI-8 (GOST 19769-74)
+.TP
+.B jis-kanji
+ISO 2022-JP
+.TP
+.B ujis
+EUC-JX: JIS 0208
+.TP
+.B ms-kanji
+Microsoft, or Shift-JIS
+.TP
+.B jis
+(from only) guesses between ISO 2022-JP, EUC or Shift-Jis
+.TP
+.B gb
+Chinese national standard (GB2312-80)
+.TP
+.B big5
+Big 5 (HKU version)
+.TP
+.B unicode
+Unicode Standard 1.0
+.TP
+.B tis
+Thai character set plus
+.SM ASCII
+(TIS 620-1986)
+.TP
+.B msdos
+IBM PC: CP 437
+.TP
+.B atari
+Atari-ST character set
+.SH EXAMPLES
+.TP
+.B tcs -f 8859-1
+Convert 8859-1 (Latin-1) characters into
+.SM UTF
+format.
+.TP
+.B tcs -s -f jis
+Convert characters encoded in one of several shift JIS encodings into
+.SM UTF
+format.
+Unknown Kanji will be converted into
+.B 0xFFFD
+characters.
+.TP
+.B tcs -t html
+Convert UTF into character set-independent HTML.
+.TP
+.B tcs -lv
+Print an up to date list of the supported character sets.
+.SH SOURCE
+.B /sys/src/cmd/tcs
+.SH SEE ALSO
+.IR ascii (1),
+.IR rune (2),
+.IR utf (6).