Registration of el-Latn language tag

Tex Texin tex at xencraft.com
Fri Sep 30 10:32:58 CEST 2005


Luc,

Luc Pardon wrote:
> 
> Tex Texin wrote:
> 
> > The registration for el-Latn more or less stipulates the need for
> > transliteration, mentions that they exist, with  a link to a site that
> > collects transliteration systems. (Which btw, I think is a really bad idea
> > in the event the site goes away or completely changes its list of reference
> > materials.)
> 
>     You have a point here. I'm new to this and I was sure the list would
> tell me if I got it wrong. As it is, I can only hope the site stays up
> until RFC3033bis sends the current registry into oblivition.
> 
>    On the other hand, there is an URL in RFC3066 as well
> (http://www.iana.org/numbers.html) so I'm in good company <g>.

Yes, of course. Nothing I said was intended to address you personally or
your actions.
You were doing what most expected would be done. If I recall correctly, you
also asked first and were told to go ahead.


> > But it doesn't really nail down what it is.
> 
>     The two standards that I provided a reference to, ISO 843 and ELOT
> 743, do nail it down in every detail. So much in fact that you could
> transcribe them straight into computer code to do the transliteration
> for you.
> 
> > (It mentions a
> > standard, but doesn't say the tag is referring to that particular standard.
> 
>    It would have been very bad indeed if it did. There are several
> standards so there is no way a single subtag can refer to them all. And
> it would not be appropriate for an IANA-registered tag to prefer one
> over the other. The two standards that I gave are not the only ones.
> There is another one used by the American Association of
> Libraries/Library of Congress. And the "US Board on Geographical Names"
> and the "Permanent Committee on Geographical Names for Britisch Official
> Use" share yet another one. And as soon as you stride out of the realm
> of officaldom, there are many many more. All have their use, in
> differenct contexts.

Yes, the proposed registration made clear there was more than one.
I don't see why a subtag couldn't address one and other tags would address
the remainder.
I question the value of a tag encompassing multiple schemes for anything
application that is actually going to do something with processing the text.
If we know the distinctions, and where they are very significant to the
processing of text, why not make the tag(s) more precise?

 
>    This is precisely why I think a "transliteration sub-subtag" could be
> useful (in theory) to further define the script-to-script mapping.

agreed as a possible alternative.
> 
>    But: RFC3066 says nowhere that a given tag should nail down the exact
> orthography of each and every word. Likewise, a script subtag should not
> be required nor expected to define an exact orthography.

Yes, not down to the level of every word. But most words would be common to
the variations.
Differences in transliteration schemes though does affect spelling of many
many words, since it is an automated change.


> 
>    The whole point of subtagging is that it supposedly gets more precise
> as you move from left to right. Script subtags do precisely that, they
> add information to the preceeding tag.

Yes, but that is not an argument to make tags more general then they need to
be.


> > So we are no longer identifying a reference or a particular language, but
> > just the concept that there seems to be something like a language of this
> > persuasion. I guess we were asking for this with es-419. (Which I was also a
> > proponent of.)
> 
>    For the record, I looked at the application for es-419 and it does in
> fact mention two references that seem to describe Latin American
> Spanish.

Well, one of the references was just a short article describing the approach
taken to conform to Microsoft's idea of Latin American Spanish. It didn't
really define a language per se. I'll have to check the registration to
refresh my memory what the other reference was. But if you are defending
what you did, I reiterate I have no issue with your actions.

However, in both cases, el-Latn and es-419, I do have trouble looking at a
document and knowing when it conforms.
With es-419, I need a list of words and grammars that distinguish one
Spanish from another. If I don't know that Peru uses a different word in
some instances, then I can't tell if the document is universal, peruvian, or
non-peruvian.

Similarly, any greek looking or sounding doc, might be a valid
transcription, a different language from greek, or gibberish, and I don't
know which tag to apply. 


Mark Davis asked in one of the mails I saw go by how I would do this for
English. I have access to dictionaries and grammar books that give me tests
I can apply.

 > > I am also not sure we should be registering transliterations.
> 
>    I am sure we should <g>.
> 
>    That is, I am sure we do need tags to label transliterations. Under
> RFC3066 rules that means registering them.
> 
>    The intro of RFC3066 gives some reasons why tagging has a purpose.
> Some of these, such as spell-checking and computer-synthesized speech,
> are difficult or impossible if you are not allowed to distinguish
> between - in this case - el and el-Latn.
> 
>    As mentioned in the el-Latn application, the W3C Web Content
> Accessibility Guidelines require "proper identification of natural
> language". This requirement applies also to short fragments of, say,
> French text embedded in English (as in "He went to a restaurant and
> ordered the plat du jour"). The last three words must be identified as
> French.
> 
>    Now, if I have a transliterated Greek word embedded in an English
> text, I can do three things:
> 
>     1) not label the word at all, i.e. it inherits the "en" label from
> the surrounding text.
>     2) label it as "el"
>     3) label it as "el-Latn"
> 
>   Think of the consequences for a text-to-speech synthesiser, and be
> sure to think of it from the perspective of a blind person, who has to
> rely only on what (s)he hears.
> 
>     1) If it is labeled (implicitly) as "en", the word would be uttered
> as if it were English, making it totally unrecognizable. Anybody who has
> difficulty of imagining the effect should try downloading a (demo
> version of) a screenreader, set it to French, switch off their monitor,
> and have it read out an English page. I did, and it is enlightening.
> 
>     2) If the word is labeled as "el", the speech synthesizer would
> activate its Greek module and that would expect Greek script and
> promptly go nuts, just the same as the English module would throw a fit
> if you feed it an English word written in Greek script.
> 
>     3) The "el-Latn" script subtag is the only way out, i.e. it is the
> only way to make this document and its transliterated content
> accessible.
> 
>    So yes, I am sure we should register transliterations, at least under
> the current rules for language tagging.

Well, I quite understand why we need tags, but thanks.
I am in no way against registrations.
But we are registering languages for the purpose of identification and
selection on the internet and software.
So I am testing what is a language, what is useful for the purposes of
software, and when should it be registered or not.
I question transliteration because it is not another language but a mapping.
It is closer to an encoding than a writing style isn't it?

The case of el-Latn, seems to me to add very little value over el. If I have
the document I can tell its in Latn script pretty fast. I would rather be
told the transliteration being used. Adding the script does give me some
value when I want to select by script and distinguish from Greek. But if
this is the extent of the value, and given the large amount of material for
every language written in Latn, I wonder why we discuss this single
instance. Bang them all in and be done with it and we can move on to
transliteration subtags. I appreciate the effort you went to, to create the
registration. I am questioning why we put you through it, why we kept up the
suspense of two weeks of waiting etc. for a minimal advantage.

More generally, it seems to me we are in a very weird space. We are adopting
policies based on a proposal that is not yet accepted, has had considerable
difficulty getting approval. More rationale behavior would be to not change
policies until the laws are actually enacted. The process of acceptance is
supposed to vet all of the procedures and decisions, so we shouldn't have to
do this on the fly. 

And usually, in fairness to opposing viewpoints, registrations wouldn't be
made based on one proposal as it then puts alternatives at a disadvantage by
creating a legacy. Unfortunately, this practice also prevents proposals that
would do a better job of maintaining backward compatibility by creating new
values that are inconsistent with the past trend. 

This is quite bizarre to me and quite unnecessary. I do believe everyone is
working to solve problems they perceive to be important and working against
the difficulties of bringing different communities together in a reasonable
time frame. Nevertheless it seems quite odd.



 
>    By the way, tagging for accessibility is in my view the most valid
> reason for tagging. And in fact, here in Europe several countries
> already have laws that require it. (The US's Section 508 doesn't require
> language tagging.)

Yes, we agree we need tags for many reasons. We need tags that have
definitions we can understand and mutually agree on. We don't need a
collection of ambiguous identifiers.

> 
> > At least with a transliteration to sign languages, (I assume they are
> > considered  transliterations) I could see that the expressiveness of signing
> > would evolve and behave like a language of its own. With transliteration
> > from one script to another, I am not so sure. (But I am not a linguist.) I
> > guess I think of Greek transliterations as one way- Going from Greek to
> > Latin, and not that people will write new Greek materials in Latin script,
> > so that it evolves like a language on its own.
> > At least with some of the other languages that were written in different
> > scripts, although you could transliterate between them, people were also
> > using the script for the purpose of writing and expression.
> 
>    In the case of Greek, your assumption is not correct.
> 
>    As I indicated in the first example of the need for transliterated
> Greek, new materials, in the form of e-mails and other communications,
> are being written in it every single day. It is considered a somewhat
> controversial practice in some circles, though not in others.
> 
>    In fact, for communicating with Greeks living here in Belgium I
> _have_ to write translitered Greek, more often than not. If I use Greek
> script they'll likely return the message saying "sorry, my computer
> can't render it". Particularly in work environments, users do not always
> have the adminstration rights to configure their computers themselves.
> On some Greek message boards, one person will post in Greek script, and
> another will reply in Latin script, depending on the technical
> infrastructure at their disposal.
> 
>    As an aside: One of the reasons that some people are vehemently
> opposed to Greeklish (Greek written "in English [script]") is the fear
> that it might eventually replace the Greek script altogether. There
> would be no such feelings if transliterated Greek was just one-way
> automated transformation.

all ok.

> 
> > The registration indicated one of the two uses of transliteration was for
> > use by non-Greeks. This suggests to me it is not being used as a language
> > but simply an alternative notation system that is autogenerated. The users
> > are not writing and expressing themselves in the transliteration.
> >
> 
>    Well, yes and no. What this second case actually refers to is short
> fragments of Greek embedded in another language.
> 
>    Imagine a chapter in an English travel guide, writing about the local
> cuisine in Russia.
> 
>    Is the writer expressing himself in transliterated Russian ? No.
> 
>    Is there a need to label the transliterated names of the Russian
> dishes differently from the English text? Yes, if you want the reader to
> order that dish over there and you want him to get what he expects (and
> provided he has a Russian-Latin text-to-speech module <g>).
> 
>    Is it auto-generated? Definitely not.

I guess I don't see that the Russian terms transliterated/transcripted into
Latn weren't autogenerated.
Some system was used or the english reader won't be able to pronounce it or
recognize it.
The author followed some rules...

> 
> > I know that is not entirely true, and do not want to overstate the point,
> > but this kind of automated transliteration occurs between most languages and
> > scripts, but is not used as language. We shouldn't need to review, register,
> > and discuss all of the combinations.
> >
> > I guess we will need a tag for transliteration of Heiroglyphics to latin as
> > well...
> > We might need one for the Rebus puzzles in the newspaper too.
> 
>    Yes <g>. Not sure about the puzzles, but any document that writes
> about Ramses and Cleopatra and Nefertiti is in fact transliterating
> hieroglyphs.
 
> >
> > We should add zh-Latn, as chinese is often written in latin script as well.
> >
> > Maybe we should just stipulate that almost everything is transliterated in
> > Latin, and simply consider it available for all languages.
> 
>    Yes. As I understand it, that is precisely what RFC3066bis will do.
> 
>    In 2.2.3, 4th item, it says "Script subtags MUST NOT be registered
> using the process in Section 3.5 of this document".
> 
>    However, until that get adopted, RFC3066 is still ruling, and under
> its rules (i.e. the current rules), tags like el-Latn MUST be registered
> with IANA before using them.
> 
>    Now, the question is, should we still be registering tags today that
> won't need registration tomorrow? That question has come up in the very
> beginning of the two-week el-Latn review period and the consensus was:
> yes, on a case by case basis and if there is a need/request.

Which seems rather pointless, given the effort required.
And damaging in the long run. We create software that embeds the registered
tags in it.
We also have to come up with natural language names for the tags so users
can recognize them.
And we come up with translations for the names for our localized software.

a) this is work and b) a slowly continually changing registry is bad for
industry; It makes for incompatibilities.
Until recently the registry added only very few tags per year and mostly
oddities that are not popularly used.
We are now adding (or replacing) tags more quickly (which was needed) but
adding one -Latn a week would cause incompatibilities.

Either bang them all in, or don't do any until 3066bis/ter/whatever appears
and then we can have some sense of versioning. A continually moving target
is a bad approach.
We seem to have gotten away from the sense of industry purpose and
responsibility here.
I don't think one by one registrations of what could be a high volume is a
good policy for us.

 
>    As I was reading through the list's archives as well as in this
> thread, it occurred to me that part of the problems and confusion with
> all this stuff may arise from the fact that the tagging mechanism is
> being increasingly used to kill two birds with one stone. It seeks to
> classify two properties that are distinct from, and orthogonal to, each
> other.
> 
>    One property is the "language", i.e. a set of agreed-upon symbols
> (sounds, gestures, ...) used to communicate a thought between a human
> sender and a human receiver. If both sides use the same set of symbols,
> they can communicate (sometimes <g>).
> 
>    The other property is the orthography, i.e. the way of rendering a
> particular symbol. The same "symbol" can be rendered in different ways,
> e.g. by different (sequences of) characters. But the rendering can also
> be auditive (speech) or visual (sign language, or mouthing actors in
> silent movies). (I'm not a linguist, so be easy on me if I'm sloppy with
> terminology.)
> 
>    Take a document in some language, written just before some spelling
> reform, and rewrite it with the new spelling rules. The orthography
> changes, the language stays the same. Likewise, no matter whether you
> have a document read out or signed or transliterated into Braille or
> printed in a monospaced font, it's still the same "language", just
> rendered differently, different "orthography".
> 
>    If we want to tag only "languages", we should not be tagging
> orthography. But if we do allow orthography into the tagging system (as
> it seems we did, e.g. de-1996), we should allow transliteration in as
> well.
> 
>    In my view - but who am I ? - it would have been better to introduce
> an entirely separate tagging system for orthography. We do use different
> tags for other rendering mechanisms, like character set and font, why
> not YAT (yet another tag) ?

I agree. I think the approach we have taken with script is a bad idea and I
am afraid of how this will be extended to voice attributes as that becomes
more important, and more generally to locales.

Given the problems with our practices and methodology in defining tags, I
also have difficulty recommending that other standards groups look to 3066
successors for tagging. I would rather see an alternative mechanism put in
place that respects industry needs and doesn't change its procedures until
acceptance and vetting has occurred.

> 
>    In any case, we should not expect absolute accuracy from any tagging
> system, unless you're prepared to tag down to every single word in every
> single document.

I never expected or requested that. I don't need absolutes. I did ask
however what this group was now using as a minimum criteria for acceptance
and what constituted distinguishing one language (or variation) from
another.

Thanks for the comments Luc.
 
>    Luc Pardon
>    Belgium
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex at XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------


More information about the Ietf-languages mailing list