Script codes in RFC 3066

Peter_Constable at sil.org Peter_Constable at sil.org
Mon Apr 14 15:41:10 CEST 2003


Tex Texin wrote on 04/10/2003 03:03:23 PM:

> >I think a policy of being as specific as possible when tagging makes
sense.
>
> I strongly disagree. If we ever get to update RFC 3066, and this includes
> script tags, then we clearly need to say that script tags only should
> be used to indicate an unusual script, and only where the script is
> otherwise not easily derivable.

I agree in general with Tex that there are good reasons to tag content as
specifically as possible when cataloguing since retrieval cannot be done in
terms of finer-grained distinctions than what have been used in
cataloguing. (That's the general rule; for a particular repository, there
may be limits in the degree of granularity that are expected to be needed.)

At the same time, I agree with Martin insofar as I think that a distinction
can be made between expected and highly-marked cases. There's nothing to be
gained in tagging text as "en-Latn" or "ar-Arab", particularly given that
lots of text is already tagged without explicitly indicating the script.
Similarly, there may be plenty of instances in which country IDs are not
needed for something that represents the 99%-usage case.

Of course, this is in the context of "language" identification. When it
comes to locales, it seems commonplace to use both language and country
elements in identifiers, even if the country portion isn't distinguishing
from any other country. I wouldn't be surprised if that was the ultimate
cause for seeing things like ja-JP in use.




- Peter


---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485





More information about the Ietf-languages mailing list