el-latn, ru-latn, and related possibilities

Tex Texin tex at yahoo-inc.com
Fri Oct 7 01:49:20 CEST 2005

Peter thanks.
I buy most of what you say.
OK- ru-Latn has usefulness without specifying transliteration. 

It seems wasteful however to specify both transliteration and script, since
script is implied by the transliteration scheme. Size of tags is somewhat a
concern for my users. But it probably isn't a compelling argument for this

The argument about the importance of script presumes heavily that tags are
for written text which increasingly is not true. Given that matching will no
longer be simple truncation anyway, placing script in a position that
impacts existing software and tags just doesn't make sense to me. But that
argument is independent of my questions on transliteration.

The issue of transliteration vs dialect precedence I confess ignorance and
follow your lead.
But I think your arguments are for the same position I had, which is that
transliteration is more important than region/dialect and so
zh-Latn-pinyin-cn is better than zh-Latn-cn-pinyin.
Do I read you wrong? I assume dialect is related to region, perhaps you had
yet another subtag in mind.

My thought was that transliteration implied script, and some degree of
orthography, which would be required to properly parse written materials,
before one could consider dialect. It also seemed natural to me to modify
the script tag or even supplant script with a transliteration tag.
I would also think that to some extent if I know how to "render" the
transliteration, I may not care about dialect as the pronunciation is
reflected in the transliteration. So for publishing purposes I might choose
a neutral or widely acceptable dialect and use that for multiple regions
just as broadcasters in the US use a midwestern dialect for national
broadcasts. Or a neutral pronunciation/transliteration for es-419... In that
event, I want the tag hierarchy to have transliteration higher than region
and only if I get into region specific language or dialect would I provide a
suitable subtag.

Tex Texin
Internationalization Architect,   Yahoo! Inc.

> -----Original Message-----
> From: ietf-languages-bounces at alvestrand.no 
> [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of 
> Peter Constable
> Sent: Wednesday, October 05, 2005 6:24 PM
> To: IETF Languages Discussion
> Subject: RE: el-latn, ru-latn, and related possibilities
> > From: Tex Texin [mailto:tex at yahoo-inc.com]
> > Guys, sorry to be the odd man out yet again, but we should first run
> through all the
> > use cases
> Certainly.
> > Text to voice is important for accessibility. Identification of the
> transliteration scheme
> > would be a prominent requirement and perhaps therefore 
> ru-Latn is not
> sufficient and
> > should not be recommended as adequate.
> Nobody said that ru-Latn would be adequate for all 
> applications. There may be *some* applications for which it's 
> adequate. I could certainly imaging someone looking for 
> transliterated Russian content without particularly caring 
> what transliteration scheme was used (or even caring to know 
> it was considered "transliteration" rather than merely 
> Russian in Latin letters). For *that* scenario, ru-Latn would 
> certainly be adequate. But there's no doubt that in a 
> text-to-voice application something more specific would be 
> required -- and supportable in 3066bis.
> > Also, if we buy the argument that script was important 
> enough to break
> compatibility
> > with lang-region, and to instead associate script with language as
> lang-script-region, I
> > would think we would want transliteration to also be tied 
> with script
> and not go after
> > region.
> > 
> > Something like zh-hans-pinyin-cn rather than zh-hans-cn-pinyin.
> (I presume you meant zh-Latn-pinyin-cn / zh-Latn-cn-pinyin.)
> The reason for putting script as the second subtag was that 
> it would typically be far more important for a user to get 
> content in a particular script than in a particular dialect 
> or spelling variant. When you get down to the level of 
> selecting between one transliteration scheme over another, I 
> think the level of concern goes *way* down: 
> - Text in the wrong transliteration scheme will likely still 
> be legible (and even minimally understandable in 
> text-to-speech), while text in the wrong script will likely 
> be quite illegible.
> - It has already been questioned how widespread the need for 
> negotiation wrt transliteration will be. For this issue you 
> raise to be of any concern, we have to be looking at 
> preferences for a particular transliteration scheme *and 
> also* a particular dialect (regional spelling variant won't 
> be a factor for transliterations -- that would amount to a 
> new transliteration scheme). Moreover, it would only be a 
> significant concern if it was clear that most users in this 
> scenario would be far more concerned to get the dialect right 
> than to get the transliteration scheme right. IMO, we're 
> talking about highly hypothetical scenarios here, and it's 
> not possible to say that one is clearly more important than 
> the other. But lets suppose that there is
> *some* user scenario where people really need to get the 
> dialect right rather than the transliteration scheme. Here 
> we're surely talking about a specialized scenario, and the 
> Language Tag Matching spec that LTRU is preparing will 
> describe means that an application can use that will achieve 
> that end. But I really don't think it's a concern for the 
> widespread implementations of left-prefix matching 
> algorithms, which were the main reason for putting script 
> before region.
> Peter Constable
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no 
> http://www.alvestrand.no/mailman/listinfo/ietf-languages

More information about the Ietf-languages mailing list