dewell at adelphia.net
Sat Nov 25 09:02:36 CET 2006
Gerard Meijssen <gerardm at wiktionaryz dot org> wrote:
> If it is so clear that they are recognisably different, would it not
> make sense to remedy this ambiguity and have a specific tag for both
> McCune-Reischauer and Revised Romanization ? These tags are apparently
> more significant than the identification of Latn.
First sentence true, second sentence false. It might make sense to have
a *subtag* of some sort to distinguish transliterations -- the question
is what type of subtag.
The difference between McCune-Reischauer and RR is truly trivial
compared to the difference between Latin and Hangul.
> The question to me is, if tagging is meant to identify a text so that
> automated processing can take place only tagging as ko-Latn does not
> suffice at all. If the identification is not intended to
> significantly identify both the language and its manifestation, what
> is its use ?
"ko" all by itself tells me the content is Korean. That is perfectly
sufficient if the content is non-written, or if it is written and I can
figure out (e.g. from the encoding) what the script is and can interpret
"ko-Latn" tells me the content is Korean written in Latin script.
"Written" means it's not spoken, sung, signed, or signaled with Morse
code or semaphore flags. "Latin script" means I have a fighting chance
of reading it, since I have a tough time reading Hangul (like John), and
no chance at all of reading Hanja.
"Automated processing," of the type you are envisioning, might be able
to figure out whether a piece of text, known to be ko-Latn, is
McCune-Reischauer or RR, or it might only understand one or the other
and couldn't make use of the distinction anyway.
There will always be limits to the fineness of detail that can
reasonably be expressed by language tags. Occasionally you will hear
the term "taggable distinction" on this list; that is a way of
expressing the concept that certain linguistic distinctions, such as
English as spoken by me vs. my wife (both Southern California natives),
are not significant enough to warrant distinct tags. Every individual's
speech is at least slightly (though perhaps imperceptibly) different
from everyone else's, but at that point we are talking about identifying
individuals, not language.
> When ISO-639-6 has to deal with the whole gamut of language families
> up to and including orthographies, is it not better to ensure that
> these codes are in ISO-639-6? The realisation that currently there is
> no way to identify properly what a text is, seems an indication of the
> failure of the current codes/system when there is no apparent remedy.
I am looking at a pre-pre-beta list of the Korean-family ISO 639-6 code
elements, and I don't see anything for Korean in Latin script, let alone
a specific transliteration. That is as I expected, since even 639-6
with its huge scope can't include every possible non-native combination
of language plus script. I am sure there are conventions for writing
Korean in the Cyrillic and Arabic and Devanagari and Thai scripts, and
there's no chance 639-6 is going to cover them all.
Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14
More information about the Ietf-languages