ISO 639 and RFC 3066bis
Addison Phillips [wM]
aphillips at webmethods.com
Wed Jun 23 05:37:15 CEST 2004
I've added your comments to our issues list, which I'll post in the morning.
I think you mean "ISO 3166", not "ISO 639" for most if not all of what follows.
Some personal observations fresh from the stream of my consciousness are inter-linearly arranged below.
Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
Internationalization is an architecture.
It is not a feature.
> -----Original Message-----
> From: ietf-languages-bounces at alvestrand.no
> [mailto:ietf-languages-bounces at alvestrand.no]On Behalf Of John Cowan
> Sent: 2004年6月22日 14:35
> To: ietf-languages at iana.org
> Subject: ISO 639 and RFC 3066bis
> I think the different types of code elements in the ISO 3166-1 code space
> ought to be straightened out with respect to their usability in
> RFC 3066bis.
I agree that we should close off all the loopholes, most of which are minor.
> My authority for what follows is the official page
> -lists/iso_3166-1_decoding_table.html .
> Officially assigned code elements are no problem: RFC 3066bis can
> use them.
> User-assigned code elements (AA, QM-QZ, XA-XZ, ZZ) are also no problem:
> they are private-use.
> Exceptionally reserved code elements (AC Ascension Island, CP Clipperton
> Island, DG Diego Garcia, EA Ceuta and Melilla, EU the European Union,
> FX metropolitan France, GG Guernsey, IC Canary Islands, IM Isle of Man,
> JE Jersey, TA Tristan da Cunha, UK United Kingdom) don't have clear
> status in RFC 3066bis (or any predecessor). UK is problematic because
> it is a synonym for GB, but language varieties like en-TA and
> es-IC represent
> sensible notions. IMHO, RFC 3066bis should say which of these
> can be used.
Okay. That sounds reasonable. I can add an instruction for the conversion of the registry. IMHO, these should be special cases.
> Transitionally reserved code elements (BU Burma, NT Neutral Zone [whose?],
The Star Trek Neutral Zone, of course, for use with the Klingon language tag. Really you should stay in and watch more TV.... ;-)
> SF Finland, SU U.S.S.R., TP East Timor, YU Yugoslavia, ZR Zaire)
> are officially
> available in RFC 3066bis. I'm a little troubled that sites allowing
> Finland-Swedish (sv-fi) must be prepared to accept the bizarre
> sv-sf as well,
> given that sf has been deprecated since 1995 and may be
> reassigned at any time.
> in which case we would be in the silly situation that both the
> well-known fi
> and the unknown sf would be usable, but the new sf would not.
The problem here is the slippery slope. Do we obsolete data that uses these defunct names or regions, just because it isn't likely that we'll create more data. I'm in favor of deprecating the old codes, but banning them seems suspicous. If CS was a bad decision, why is SU, YU, BU, TP, etc. a good enough one to allow in? The obvious problem here is that, given an alpha2 namespace for ISO 3166 to work with and a reasonably desire for mnemonicity (is that a word??), it won't be that long before we have a healthy list of UN M49 numbers resulting from reassignments.
> Indeterminately reserved code elements (quite a few) are really
> not part of
> ISO 639 at all, but are reserved to avoid collisions with systems that are
> meant to be upward-compatible extensions of ISO 639. They are
> not supposed
> to be used except in those particular systems. RFC 3066bis
> should exclude them.
I assume that reserved codes that aren't assigned are codes that are not assigned (and thus banned).
> Code elements not used at the present stage are WIPO codes for various
> transnational intellectual property associations. They are reserved and
> unused in ISO 639, but this may change in future. They have nothing to
> do with languages anyhow.
Okay. See indeterminately reserved codes.
> Unassigned code elements are obviously not usable.
> An unrelated point: the ABNF spec says that ALPHA and DIGIT and '-' mean
> the octets whose ASCII codes are alphabetics, digits, or hyphus. We need
> a note saying we are taking about characters, not octets.
Yes, we need to clarify that we mean tags as abstract, character string entities, not specific octets in a specific encoding. Presumably EBCDIC language tags are okay too....
> John Cowan cowan at ccil.org www.reutershealth.com www.ccil.org/~cowan
> I come from under the hill, and under the hills and over the
> hills my paths
> led. And through the air. I am he that walks unseen. I am the
> the web-cutter, the stinging fly. I was chosen for the lucky
> number. --Bilbo
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
More information about the Ietf-languages