NAME
cs00 - Kana-Kanji conversion server (CS)
SYNOPSIS
/usr/sbin/cs00
AVAILABILITY
SUNWjc0u
DESCRIPTION
The cs00 daemon receives strings from a program such as XCI
(xci(7)), performs Kana-Kanji conversion to the string, and
returns the result to the program.
cs00 uses the "n-bunsetsu saidai itchi hou" version of the
"ren-bunsetsu housiki" (joined-morpheme method) for conver-
sion, allowing a string of up to 512 characters to be con-
verted at one time. More than 50,000 words are registered
in the main dictionary which can be modified using mdicm(1).
The user can add words to his or her own "user dictionary".
See udicmtool(1) and udicm(1) for a detailed discription of
the main dictionary and the user dictionary.
Code conversion
cs00 converts received character code according to the rule
defined in the files. ROMAJI to KANA conversion and
character-type interconversion also use the rule.
Code conversion definition files
The code conversion rules are defined in the following
files.
Filenames
hiragana.ccv
for HIRAGANA mode
katakana.ccv
for full-size KATAKANA mode
h_katakana.ccv
for half-size KATAKANA mode
eisuu.ccv
for full-size alphanumeric mode
h_eisuu.ccv
for half-size alphanumeric mode
Rules for half-size alphanumeric mode is also used in
the KUTEN input mode.
Placement of the definition files
Code conversion definition files can be placed in the
following directories. cs00 looks for each file in
these directories in the order below and the file
found first is to be used.
1. $HOME/.mle/locale/cs00/
2. /usr/lib/mle/locale/cs00/
Customizing name of conversion definition files
To customize the names of the files, add the following
configrable values in the file resources or config.
CC_HR for HIRAGANA mode
CC_KT for full-size KATAKANA mode
CC_HKT
for half-size KATAKANA mode
CC_EIS
for full-size alphanumeric mode
CC_HEIS
for half-size alphanumeric mode
(Example)
In file resources:
*xci*cs00.config.CC_HR:my_hiragana.ccv
In file config:
CC_HR = "my_hiragana.ccv"
File format
Each conversion rule consists of a line in the follow-
ing format:
string1 string2 [ number | s ]
string1
Specifies an input string to be converted.
string2
Specifies a string that is the result of con-
verting string1.
number | s
A number value (an integer) specifies the number
of characters in the string1 (counted from the
last character of the string1 ) to be re-
converted. This number must be smaller than the
length of the string1. The default is 0. Instead
of a number value, 's' specifies that a "non-
fixed" entry is not converted. "Non-fixed" means
that an input string can match with more than
one rule. 's' and a number value cannot be
specified at the same time. Note: The characters
enclosed by a pair of square brackets [ ]
represent Japanese Kana. The upper cases (e.g.
[KI]) represent a regular size Kana. The lower
cases (e.g. [tsu]) represent a small size Kana.
For example, to obtain "[KI][tsu][TO]" by input-
ting "kitto", the following rules must be
defined:
ki [KI]
to [TO]
tto [tsu] 2
First, as the input string "kitto" matches with "ki",
it is converted to "[KI]". Second, as the input string
matches to "tto", it is converted to "[tsu]". More-
over, the last two characters of "tto" are re-
evaluated. As they match with "to", the string "to" is
converted to "[TO]". As a whole, "[KI][tsu][TO]" is
obtained.
There are other rules that result "[KI][tsu][TO]" from
"kitto" as described in the following. Each set of
the following conversion rules provides
"[KI][tsu][TO]" by inputting "kitto". But both can
cause some problems.
With the following set of rules, the input string
"kitto" is also converted to "[KI][tsu][TO]". However,
the string "ttttt" is converted to
"[tsu][tsu][tsu][tsu]t":
ki [KI]
to [TO]
tt [tsu] 1
With the following set of rules, when you change the
definition of "to", you must also change the defini-
tion of "tto":
ki [KI]
to [TO]
tto [tsu][TO]
The following describes the conversion rules for non-
fixed "n". Assume that the following rules are
defined:
n [N] s
to [TO]na [NA]
tto [tsu][TO]ni [NI]
The input string "n" matches with the rule "n".
But 's' is specified in the rule, and this
string "n" is a non-fixed entry. Therefore, the
string "n" is not converted to "[N]". If "a" is
entered after "n", the string will be converted
to "[NA]". If "i" is entered after "n", the
string will be converted to "[NI]". "n" is con-
verted to "[N]" only when "n" is fixed. If 's'
is not specified, "n" is converted to "[N]", and
when "a" or "i" is entered after "n", "[N]" will
change to "[NA]" or "[NI]", respectively.
The maximum character length of a line is 1024, and a new-
line character terminates the line. A line starts with a
hash sign "#" is a comment line. The delimiters between the
fields in a rule are spaces or tabs.
The following extension characters are required to use
the control characters in string1 or string2:
\n New line
\r Carriage return
\t Tab
\f Form feed
\~ Space (0x20)
\{ (
\} )
\# #
\\ \
\^ ^
\0 Octal (\001, \012)
\1 Octal (\100, \123)
\x Two-digit hexadecimal (\x01, \xff)
\w Four-digit hexadecimal (\w0101, \wabcd)
\q Eight-digit hexadecimal (\q00000101, \q8000cdab)
\k Code (Kuten code) (\k0101, \k1616)
FILES
/usr/lib/mle/ja/cs00/cs00_m.dic
Main dictionary for Kana-Kanji conversion
/usr/lib/mle/ja/cs00/cs00_u.dic
User dictionary for Kana-Kanji conversion
/usr/lib/mle/ja/cs00/hiragana.ccv
Code conversion rule definition file for HIRAGANA mode
/usr/lib/mle/ja/cs00/katakana.ccv
Code conversion rule definition file for full-size
KATAKANA mode
/usr/lib/mle/ja/cs00/h_katakana.ccv
Code conversion rule definition file for half-size
KATAKANA mode
/usr/lib/mle/ja/cs00/eisuu.ccv
Code conversion rule definition file for full-size
alphanumeric mode
/usr/lib/mle/ja/cs00/h_eisuu.ccv
Code conversion rule definition file for half-size
alphanumeric mode
NOTES
mdicm(1) or udicm(1) should be used to modify cs00_m.dic or
cs00_u.dic, respectively.
|
Закладки на сайте Проследить за страницей |
Created 1996-2025 by Maxim Chirkov Добавить, Поддержать, Вебмастеру |