Tagging transliterations from a specific script

Thu Mar 17 03:21:43 CET 2011

Hello Michael,

Besides directionality issues (as others have explained), there is also 
the issue of search. Say somebody searches for Tatar in Arabic. They 
most probably do that because they read Tatar in Arabic, but not in 
Cyrillic, and not in Latin (of whatever transliteration). But when 
searching with tt-Arab, you easily may get returned content labeled as 
tt-Arab-alalc92.

More abstractly, a tag such as tt-Arab-alalc92 would just be lying. It 
says it's Arabic, but it's not. Humans get over this from context or 
even think it's clever or cute, but it's still a lie. alalc92 is a 
variant tag, and the idea of variant tags is to stay inside whatever 
prefix there is, but for tt-Arab-alalc92, alalc92 jumps out of Arab.

You are right that the data will tell, but one very important use of 
language tags are for metadata, in order not to have to look at the data 
itself, because that's much more efficient.

Regards,    Martin.

On 2011/03/17 7:57, Michael Everson wrote:
> On 16 Mar 2011, at 22:48, David Starner wrote:
>
>>> What is wrong with that? The use of -alalc92 in those contexts couldn't mean anything else.
>>
>> Except to a computer, which may look at a tag like tt-Arab-something and set the environment to RTL, or just tell the user that the text is in the Arabic script. Even humans who don't know the registry well would probably assume that was in Arabic script.
>
> No, it is the characters themselves which have RTL properties and which invoke such behavioural functionality. The language tag is not the place for the computer to determine that.
>
> Michael Everson * http://www.evertype.com/
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst at it.aoyama.ac.jp