Script codes in RFC 3066

Thu Apr 10 05:10:45 CEST 2003

Martin Duerst wrote:
>    One particular concern I have is that once there is a productive
>    pattern, the assumption that all the slots have to be filled in
>    seems to spread in an uncontrolled way. I have seen numerous examples
>    of tags such as 'ja-jp', which in particular as far as language goes,
>    doesn't give more information than simply 'ja'. 

While true, it doesn't hurt and provides some protection in case Japanese
becomes used heavily in another region.

>    Another point is that while something like az-latn/az-Cryl is very
>    good for language negotiation (e.g. HTTP Accept-Language/
>    Content-Language headers), it is really enough to mark up the
>    actual text (e.g. with xml:lang) with 'az' only, because the
>    script is self-evident from the characters used.

Although true, why require the script to be examined to make decisions about
the thing being tagged?
I could be filtering the text, routing the text, or performing a number of
different operations based on the language-script...

For some documents, I might have to scan past quite a bit of header or other
meta information, before I get to actual content to be able to determine the
script.

I think a policy of being as specific as possible when tagging makes sense.

tex

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex at XenCraft.com
Xen Master                          http://www.i18nGuy.com

XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------