Language and script encoding standards

Sat Jul 22 21:28:36 CEST 2006

Apologies if I woke anyone up.  :-)  Both lists have been very quiet of 
late.

I think the important points that have emerged from this discussion on 
LTRU [1] are:

1.  There is a need to tag content that follows certain systems of 
transliteration or transcription.  While the original Greek text (or 
Cyrillic or Devanagari or whatever) will generally be more "authentic" 
in a way, transcriptions will continue to have their place in scholarly 
as well as popular material.  Sometimes it may be important to indicate 
the transcription scheme of tagged content; we are in no position to say 
this is not the case.

While we have defined a mechanism for variant subtags, it is somewhat 
vague (of necessity) and my impression is that the bar for acceptance of 
a variant subtag proposal on ietf-languages is fairly high.  Every 
discussion about a variant subtag proposal becomes a discussion of the 
deeper question of whether a variant is the appropriate vehicle for the 
purpose.  This is reasonable, but not if it draws too much attention 
from the specific proposal at hand, and not if the bar for acceptance is 
disproportionately high compared to the criteria for encoding a language 
in ISO 639, a script in 15924, or a country in 3166.

Section 2.2.5 says, "Variant subtags are used to indicate additional, 
well-recognized variations that define a language or its dialects which 
are not covered by other available subtags," and we should reach an 
understanding on exactly what that means, so that when a real proposal 
comes along, we don't have to have that discussion again.

2.  The Neel Smith document [2] shows quite a bit of misunderstanding of 
RFC 3066bis: tagging of character sets, use of code elements directly 
from ISO standards instead of the Registry (Smith would not have 
proposed using "grc" and "lat" otherwise), conflation of local 
alphabetic variation with the ISO 15924 concept of "script."  More 
documents and papers that explain the RFC 3066bis mechanisms, such as 
those already written by Addison and Mark, will help.  I suspect more 
will be written after the documents are finally given RFC numbers and 
the mapping to "BCP 47" is finally moved.

[1] http://www1.ietf.org/mail-archive/web/ltru/current/msg05087.html
[2] http://chs75.harvard.edu/projects/diginc/techpub/language-script

--
Doug Ewell
Fullerton, California, USA
http://users.adelphia.net/~dewell/