Phonetic orthographies

Sat Nov 25 09:02:36 CET 2006

Gerard Meijssen <gerardm at wiktionaryz dot org> wrote:

> If it is so clear that they are recognisably different, would it not 
> make sense to remedy this ambiguity and have a specific tag for both 
> McCune-Reischauer and Revised Romanization ? These tags are apparently 
> more significant than the identification of Latn.

First sentence true, second sentence false.  It might make sense to have 
a *subtag* of some sort to distinguish transliterations -- the question 
is what type of subtag.

The difference between McCune-Reischauer and RR is truly trivial 
compared to the difference between Latin and Hangul.

> The question to me is, if tagging is meant to identify a text so that 
> automated processing can take place only tagging as ko-Latn does not 
> suffice at all.  If the identification is not intended to 
> significantly identify both the language and its manifestation, what 
> is its use ?

"ko" all by itself tells me the content is Korean.  That is perfectly 
sufficient if the content is non-written, or if it is written and I can 
figure out (e.g. from the encoding) what the script is and can interpret 
it.

"ko-Latn" tells me the content is Korean written in Latin script. 
"Written" means it's not spoken, sung, signed, or signaled with Morse 
code or semaphore flags.  "Latin script" means I have a fighting chance 
of reading it, since I have a tough time reading Hangul (like John), and 
no chance at all of reading Hanja.

"Automated processing," of the type you are envisioning, might be able 
to figure out whether a piece of text, known to be ko-Latn, is 
McCune-Reischauer or RR, or it might only understand one or the other 
and couldn't make use of the distinction anyway.

There will always be limits to the fineness of detail that can 
reasonably be expressed by language tags.  Occasionally you will hear 
the term "taggable distinction" on this list; that is a way of 
expressing the concept that certain linguistic distinctions, such as 
English as spoken by me vs. my wife (both Southern California natives), 
are not significant enough to warrant distinct tags.  Every individual's 
speech is at least slightly (though perhaps imperceptibly) different 
from everyone else's, but at that point we are talking about identifying 
individuals, not language.

> When ISO-639-6 has to deal with the whole gamut of language families 
> up to and including orthographies, is it not better to ensure that 
> these codes are in ISO-639-6? The realisation that currently there is 
> no way to identify properly what a text is, seems an indication of the 
> failure of the current codes/system when there is no apparent remedy.

I am looking at a pre-pre-beta list of the Korean-family ISO 639-6 code 
elements, and I don't see anything for Korean in Latin script, let alone 
a specific transliteration.  That is as I expected, since even 639-6 
with its huge scope can't include every possible non-native combination 
of language plus script.  I am sure there are conventions for writing 
Korean in the Cyrillic and Arabic and Devanagari and Thai scripts, and 
there's no chance 639-6 is going to cover them all.

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages