diff options
author | Taru Karttunen <taruti@taruti.net> | 2011-03-30 16:49:47 +0300 |
---|---|---|
committer | Taru Karttunen <taruti@taruti.net> | 2011-03-30 16:49:47 +0300 |
commit | b41b9034225ab3e49980d9de55c141011b6383b0 (patch) | |
tree | 891014b4c2e803e01ac7a1fd2b60819fbc5a6e73 /sys/man/1/tcs | |
parent | c558a99e0be506a9abdf677f0ca4490644e05fc1 (diff) |
Import sources from 2011-03-30 iso image - sys/man
Diffstat (limited to 'sys/man/1/tcs')
-rwxr-xr-x | sys/man/1/tcs | 172 |
1 files changed, 172 insertions, 0 deletions
diff --git a/sys/man/1/tcs b/sys/man/1/tcs new file mode 100755 index 000000000..664073d36 --- /dev/null +++ b/sys/man/1/tcs @@ -0,0 +1,172 @@ +.TH TCS 1 +.SH NAME +tcs \- translate character sets +.SH SYNOPSIS +.B tcs +[ +.B -slcv +] +[ +.B -f +.I ics +] +[ +.B -t +.I ocs +] +[ +.I file ... +] +.SH DESCRIPTION +.I Tcs +interprets the named +.I file(s) +(standard input default) as a stream of characters from the +.I ics +character set or format, converts them to runes, +and then converts them into a stream of characters from the +.I ocs +character set or format on the standard output. +The default value for +.I ics +and +.I ocs +is +.BR utf , +the +.SM UTF +encoding described in +.IR utf (6). +The +.B -l +option lists the character sets known to +.IR tcs . +Processing continues in the face of conversion errors (the +.B -s +option prevents reporting of these errors). +The +.B -c +option forces the output to contain only correctly converted characters; +otherwise, +.B Runeerror +(0xFFFD) +characters will be substituted for +.SM UTF +encoding errors and unknown characters. +.PP +The +.B -v +option generates various diagnostic and summary information on standard error, +or makes the +.B -l +output more verbose. +.PP +.I Tcs +recognizes an ever changing list of character sets. +In particular, it supports a variety of Russian and Japanese encodings. +Some of the supported encodings are +.TF jis-kanji +.TP +.B utf +The Plan 9 +.SM UTF +encoding, known by ISO as UTF-8 +.TP +.B utf1 +The deprecated original +.SM UTF +encoding from ISO 10646 +.TP +.B ascii +7-bit ASCII +.TP +.B 8859-1 +Latin-1 (Central European) +.TP +.B 8859-2 +Latin-2 (Czech .. Slovak) +.TP +.B 8859-3 +Latin-3 (Dutch .. Turkish) +.TP +.B 8859-4 +Latin-4 (Scandinavian) +.TP +.B 8859-5 +Part 5 (Cyrillic) +.TP +.B 8859-6 +Part 6 (Arabic) +.TP +.B 8859-7 +Part 7 (Greek) +.TP +.B 8859-8 +Part 8 (Hebrew) +.TP +.B 8859-9 +Latin-5 (Finnish .. Portuguese) +.TP +.B html +Unicode as encoded by HTML +.TP +.B koi8 +KOI-8 (GOST 19769-74) +.TP +.B jis-kanji +ISO 2022-JP +.TP +.B ujis +EUC-JX: JIS 0208 +.TP +.B ms-kanji +Microsoft, or Shift-JIS +.TP +.B jis +(from only) guesses between ISO 2022-JP, EUC or Shift-Jis +.TP +.B gb +Chinese national standard (GB2312-80) +.TP +.B big5 +Big 5 (HKU version) +.TP +.B unicode +Unicode Standard 1.0 +.TP +.B tis +Thai character set plus +.SM ASCII +(TIS 620-1986) +.TP +.B msdos +IBM PC: CP 437 +.TP +.B atari +Atari-ST character set +.SH EXAMPLES +.TP +.B tcs -f 8859-1 +Convert 8859-1 (Latin-1) characters into +.SM UTF +format. +.TP +.B tcs -s -f jis +Convert characters encoded in one of several shift JIS encodings into +.SM UTF +format. +Unknown Kanji will be converted into +.B 0xFFFD +characters. +.TP +.B tcs -t html +Convert UTF into character set-independent HTML. +.TP +.B tcs -lv +Print an up to date list of the supported character sets. +.SH SOURCE +.B /sys/src/cmd/tcs +.SH SEE ALSO +.IR ascii (1), +.IR rune (2), +.IR utf (6). |