Tagging transliterations from a specific script
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Thu Mar 17 03:21:43 CET 2011
Besides directionality issues (as others have explained), there is also
the issue of search. Say somebody searches for Tatar in Arabic. They
most probably do that because they read Tatar in Arabic, but not in
Cyrillic, and not in Latin (of whatever transliteration). But when
searching with tt-Arab, you easily may get returned content labeled as
More abstractly, a tag such as tt-Arab-alalc92 would just be lying. It
says it's Arabic, but it's not. Humans get over this from context or
even think it's clever or cute, but it's still a lie. alalc92 is a
variant tag, and the idea of variant tags is to stay inside whatever
prefix there is, but for tt-Arab-alalc92, alalc92 jumps out of Arab.
You are right that the data will tell, but one very important use of
language tags are for metadata, in order not to have to look at the data
itself, because that's much more efficient.
On 2011/03/17 7:57, Michael Everson wrote:
> On 16 Mar 2011, at 22:48, David Starner wrote:
>>> What is wrong with that? The use of -alalc92 in those contexts couldn't mean anything else.
>> Except to a computer, which may look at a tag like tt-Arab-something and set the environment to RTL, or just tell the user that the text is in the Arabic script. Even humans who don't know the registry well would probably assume that was in Arabic script.
> No, it is the characters themselves which have RTL properties and which invoke such behavioural functionality. The language tag is not the place for the computer to determine that.
> Michael Everson * http://www.evertype.com/
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
More information about the Ietf-languages