Language and script encoding standards
Doug Ewell
dewell at adelphia.net
Sat Jul 22 21:28:36 CEST 2006
Apologies if I woke anyone up. :-) Both lists have been very quiet of
late.
I think the important points that have emerged from this discussion on
LTRU [1] are:
1. There is a need to tag content that follows certain systems of
transliteration or transcription. While the original Greek text (or
Cyrillic or Devanagari or whatever) will generally be more "authentic"
in a way, transcriptions will continue to have their place in scholarly
as well as popular material. Sometimes it may be important to indicate
the transcription scheme of tagged content; we are in no position to say
this is not the case.
While we have defined a mechanism for variant subtags, it is somewhat
vague (of necessity) and my impression is that the bar for acceptance of
a variant subtag proposal on ietf-languages is fairly high. Every
discussion about a variant subtag proposal becomes a discussion of the
deeper question of whether a variant is the appropriate vehicle for the
purpose. This is reasonable, but not if it draws too much attention
from the specific proposal at hand, and not if the bar for acceptance is
disproportionately high compared to the criteria for encoding a language
in ISO 639, a script in 15924, or a country in 3166.
Section 2.2.5 says, "Variant subtags are used to indicate additional,
well-recognized variations that define a language or its dialects which
are not covered by other available subtags," and we should reach an
understanding on exactly what that means, so that when a real proposal
comes along, we don't have to have that discussion again.
2. The Neel Smith document [2] shows quite a bit of misunderstanding of
RFC 3066bis: tagging of character sets, use of code elements directly
from ISO standards instead of the Registry (Smith would not have
proposed using "grc" and "lat" otherwise), conflation of local
alphabetic variation with the ISO 15924 concept of "script." More
documents and papers that explain the RFC 3066bis mechanisms, such as
those already written by Addison and Mark, will help. I suspect more
will be written after the documents are finally given RFC numbers and
the mapping to "BCP 47" is finally moved.
[1] http://www1.ietf.org/mail-archive/web/ltru/current/msg05087.html
[2] http://chs75.harvard.edu/projects/diginc/techpub/language-script
--
Doug Ewell
Fullerton, California, USA
http://users.adelphia.net/~dewell/
More information about the Ietf-languages
mailing list