Registration of el-Latn language tag

Thu Sep 29 17:23:35 CEST 2005

Tex Texin wrote:

> The registration for el-Latn more or less stipulates the need for
> transliteration, mentions that they exist, with  a link to a site that
> collects transliteration systems. (Which btw, I think is a really bad idea
> in the event the site goes away or completely changes its list of reference
> materials.)

    You have a point here. I'm new to this and I was sure the list would
tell me if I got it wrong. As it is, I can only hope the site stays up
until RFC3033bis sends the current registry into oblivition.

   On the other hand, there is an URL in RFC3066 as well
(http://www.iana.org/numbers.html) so I'm in good company <g>.

> But it doesn't really nail down what it is. 

    The two standards that I provided a reference to, ISO 843 and ELOT
743, do nail it down in every detail. So much in fact that you could
transcribe them straight into computer code to do the transliteration
for you.

> (It mentions a
> standard, but doesn't say the tag is referring to that particular standard.

   It would have been very bad indeed if it did. There are several
standards so there is no way a single subtag can refer to them all. And
it would not be appropriate for an IANA-registered tag to prefer one
over the other. The two standards that I gave are not the only ones.
There is another one used by the American Association of
Libraries/Library of Congress. And the "US Board on Geographical Names"
and the "Permanent Committee on Geographical Names for Britisch Official
Use" share yet another one. And as soon as you stride out of the realm
of officaldom, there are many many more. All have their use, in
differenct contexts.

   This is precisely why I think a "transliteration sub-subtag" could be
useful (in theory) to further define the script-to-script mapping.

   But: RFC3066 says nowhere that a given tag should nail down the exact
orthography of each and every word. Likewise, a script subtag should not
be required nor expected to define an exact orthography.

   The whole point of subtagging is that it supposedly gets more precise
as you move from left to right. Script subtags do precisely that, they
add information to the preceeding tag.

> 
> So we are no longer identifying a reference or a particular language, but
> just the concept that there seems to be something like a language of this
> persuasion. I guess we were asking for this with es-419. (Which I was also a
> proponent of.)

   For the record, I looked at the application for es-419 and it does in
fact mention two references that seem to describe Latin American
Spanish.

> I am also not sure we should be registering transliterations.

   I am sure we should <g>. 

   That is, I am sure we do need tags to label transliterations. Under
RFC3066 rules that means registering them.

   The intro of RFC3066 gives some reasons why tagging has a purpose.
Some of these, such as spell-checking and computer-synthesized speech,
are difficult or impossible if you are not allowed to distinguish
between - in this case - el and el-Latn. 

   As mentioned in the el-Latn application, the W3C Web Content
Accessibility Guidelines require "proper identification of natural
language". This requirement applies also to short fragments of, say,
French text embedded in English (as in "He went to a restaurant and
ordered the plat du jour"). The last three words must be identified as
French.

   Now, if I have a transliterated Greek word embedded in an English
text, I can do three things:

    1) not label the word at all, i.e. it inherits the "en" label from
the surrounding text.
    2) label it as "el"
    3) label it as "el-Latn"

  Think of the consequences for a text-to-speech synthesiser, and be
sure to think of it from the perspective of a blind person, who has to
rely only on what (s)he hears.

    1) If it is labeled (implicitly) as "en", the word would be uttered
as if it were English, making it totally unrecognizable. Anybody who has
difficulty of imagining the effect should try downloading a (demo
version of) a screenreader, set it to French, switch off their monitor,
and have it read out an English page. I did, and it is enlightening.

    2) If the word is labeled as "el", the speech synthesizer would
activate its Greek module and that would expect Greek script and
promptly go nuts, just the same as the English module would throw a fit
if you feed it an English word written in Greek script.

    3) The "el-Latn" script subtag is the only way out, i.e. it is the
only way to make this document and its transliterated content
accessible.

   So yes, I am sure we should register transliterations, at least under
the current rules for language tagging.

   By the way, tagging for accessibility is in my view the most valid
reason for tagging. And in fact, here in Europe several countries
already have laws that require it. (The US's Section 508 doesn't require
language tagging.)

> At least with a transliteration to sign languages, (I assume they are
> considered  transliterations) I could see that the expressiveness of signing
> would evolve and behave like a language of its own. With transliteration
> from one script to another, I am not so sure. (But I am not a linguist.) I
> guess I think of Greek transliterations as one way- Going from Greek to
> Latin, and not that people will write new Greek materials in Latin script,
> so that it evolves like a language on its own.
> At least with some of the other languages that were written in different
> scripts, although you could transliterate between them, people were also
> using the script for the purpose of writing and expression.

   In the case of Greek, your assumption is not correct.

   As I indicated in the first example of the need for transliterated
Greek, new materials, in the form of e-mails and other communications,
are being written in it every single day. It is considered a somewhat
controversial practice in some circles, though not in others. 

   In fact, for communicating with Greeks living here in Belgium I
_have_ to write translitered Greek, more often than not. If I use Greek
script they'll likely return the message saying "sorry, my computer
can't render it". Particularly in work environments, users do not always
have the adminstration rights to configure their computers themselves.
On some Greek message boards, one person will post in Greek script, and
another will reply in Latin script, depending on the technical
infrastructure at their disposal.

   As an aside: One of the reasons that some people are vehemently
opposed to Greeklish (Greek written "in English [script]") is the fear
that it might eventually replace the Greek script altogether. There
would be no such feelings if transliterated Greek was just one-way
automated transformation.

> The registration indicated one of the two uses of transliteration was for
> use by non-Greeks. This suggests to me it is not being used as a language
> but simply an alternative notation system that is autogenerated. The users
> are not writing and expressing themselves in the transliteration.
> 

   Well, yes and no. What this second case actually refers to is short
fragments of Greek embedded in another language. 

   Imagine a chapter in an English travel guide, writing about the local
cuisine in Russia. 

   Is the writer expressing himself in transliterated Russian ? No. 

   Is there a need to label the transliterated names of the Russian
dishes differently from the English text? Yes, if you want the reader to
order that dish over there and you want him to get what he expects (and
provided he has a Russian-Latin text-to-speech module <g>). 

   Is it auto-generated? Definitely not.

> I know that is not entirely true, and do not want to overstate the point,
> but this kind of automated transliteration occurs between most languages and
> scripts, but is not used as language. We shouldn't need to review, register,
> and discuss all of the combinations.
> 
> I guess we will need a tag for transliteration of Heiroglyphics to latin as
> well...
> We might need one for the Rebus puzzles in the newspaper too.

   Yes <g>. Not sure about the puzzles, but any document that writes
about Ramses and Cleopatra and Nefertiti is in fact transliterating
hieroglyphs.

> 
> We should add zh-Latn, as chinese is often written in latin script as well.
> 
> Maybe we should just stipulate that almost everything is transliterated in
> Latin, and simply consider it available for all languages.

   Yes. As I understand it, that is precisely what RFC3066bis will do.

   In 2.2.3, 4th item, it says "Script subtags MUST NOT be registered
using the process in Section 3.5 of this document".

   However, until that get adopted, RFC3066 is still ruling, and under
its rules (i.e. the current rules), tags like el-Latn MUST be registered
with IANA before using them.

   Now, the question is, should we still be registering tags today that
won't need registration tomorrow? That question has come up in the very
beginning of the two-week el-Latn review period and the consensus was:
yes, on a case by case basis and if there is a need/request.

   As I was reading through the list's archives as well as in this
thread, it occurred to me that part of the problems and confusion with
all this stuff may arise from the fact that the tagging mechanism is
being increasingly used to kill two birds with one stone. It seeks to
classify two properties that are distinct from, and orthogonal to, each
other. 

   One property is the "language", i.e. a set of agreed-upon symbols
(sounds, gestures, ...) used to communicate a thought between a human
sender and a human receiver. If both sides use the same set of symbols,
they can communicate (sometimes <g>).  

   The other property is the orthography, i.e. the way of rendering a
particular symbol. The same "symbol" can be rendered in different ways,
e.g. by different (sequences of) characters. But the rendering can also
be auditive (speech) or visual (sign language, or mouthing actors in
silent movies). (I'm not a linguist, so be easy on me if I'm sloppy with
terminology.)

   Take a document in some language, written just before some spelling
reform, and rewrite it with the new spelling rules. The orthography
changes, the language stays the same. Likewise, no matter whether you
have a document read out or signed or transliterated into Braille or
printed in a monospaced font, it's still the same "language", just
rendered differently, different "orthography".

   If we want to tag only "languages", we should not be tagging
orthography. But if we do allow orthography into the tagging system (as
it seems we did, e.g. de-1996), we should allow transliteration in as
well.

   In my view - but who am I ? - it would have been better to introduce
an entirely separate tagging system for orthography. We do use different
tags for other rendering mechanisms, like character set and font, why
not YAT (yet another tag) ?

   In any case, we should not expect absolute accuracy from any tagging
system, unless you're prepared to tag down to every single word in every
single document.

   Luc Pardon
   Belgium