Korean (and Japanese)

Martin Duerst duerst at it.aoyama.ac.jp
Thu Sep 28 12:42:16 CEST 2006


Hello Masayasu,

At 18:24 06/09/28, Masayasu Ishikawa wrote:
>Martin Duerst wrote:
>
>> Speaking about Japanese, I'm quite surprised that we don't have
>>     Supress-script: Jpan
>> at
>>     Type: language
>>     Subtag: ja
>>     Description: Japanese
>>     Added: 2005-10-16
>> When something is tagged ja, the assumption is that it's written in
>> Kanji-Kana mixture, because that's how Japanese is written.
>
>In many cases yes, but you can find quite a few exceptions.
>For example, "Ishikawa" in romaji is Japanese, even if it is
>written in Latin script, and I used the language tag "ja" in
>quite a few places for that. With RFC 4646 I could use "ja-Latn"
>in that case, but anyway, there's more than one way to write
>Japanese and I wouldn't simply assume that "ja" means "written
>in Kanji-Kana mixture".

I guess this makes sense. But my guess is that this is on a
micro level. On that level, the main usages of the information
are e.g. for text-to-speach. What you want is that your name
is pronounced in the Japanese way rather than the way an
English (or some other language) speaker would pronounce it.
The characters are there in the data, so tagging the script
would actually be overkill, either redundant or contradictory.

I think this is an interesting aspect of language tagging that
maybe we should mention in the RFC 4646bis: The script can
be left out if it's obvious from the data. This would just
be a special case of John's "tag wisely" general principle.
This in my view doesn't justify to not use Jpan as a supress-
script, but it gives you the licence to just use ja even if
ja-Latn would be more precise.

I think the main use case for suppress-script is for things
such as whole-document search: You don't want to have to look
into each document to check what script(s) it uses. In that
case, to me the only reasonable way for things to work is that
ja means "Japanese, as usually written (Kanji-Kana mixture),
and ja-Latn is used for romanized documents (except for learning
material, and some stuff in the early years of computer use,
I haven't seen a single ja-Latn document, and I look at
Japanese all day).

Would you agree with this analysis?

Regards,      Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     



More information about the Ietf-languages mailing list