ISO 639 and RFC 3066bis
cowan at ccil.org
Tue Jun 22 23:35:25 CEST 2004
I think the different types of code elements in the ISO 3166-1 code space
ought to be straightened out with respect to their usability in RFC 3066bis.
My authority for what follows is the official page
Officially assigned code elements are no problem: RFC 3066bis can use them.
User-assigned code elements (AA, QM-QZ, XA-XZ, ZZ) are also no problem:
they are private-use.
Exceptionally reserved code elements (AC Ascension Island, CP Clipperton
Island, DG Diego Garcia, EA Ceuta and Melilla, EU the European Union,
FX metropolitan France, GG Guernsey, IC Canary Islands, IM Isle of Man,
JE Jersey, TA Tristan da Cunha, UK United Kingdom) don't have clear
status in RFC 3066bis (or any predecessor). UK is problematic because
it is a synonym for GB, but language varieties like en-TA and es-IC represent
sensible notions. IMHO, RFC 3066bis should say which of these can be used.
Transitionally reserved code elements (BU Burma, NT Neutral Zone [whose?],
SF Finland, SU U.S.S.R., TP East Timor, YU Yugoslavia, ZR Zaire) are officially
available in RFC 3066bis. I'm a little troubled that sites allowing
Finland-Swedish (sv-fi) must be prepared to accept the bizarre sv-sf as well,
given that sf has been deprecated since 1995 and may be reassigned at any time.
in which case we would be in the silly situation that both the well-known fi
and the unknown sf would be usable, but the new sf would not.
Indeterminately reserved code elements (quite a few) are really not part of
ISO 639 at all, but are reserved to avoid collisions with systems that are
meant to be upward-compatible extensions of ISO 639. They are not supposed
to be used except in those particular systems. RFC 3066bis should exclude them.
Code elements not used at the present stage are WIPO codes for various
transnational intellectual property associations. They are reserved and
unused in ISO 639, but this may change in future. They have nothing to
do with languages anyhow.
Unassigned code elements are obviously not usable.
An unrelated point: the ABNF spec says that ALPHA and DIGIT and '-' mean
the octets whose ASCII codes are alphabetics, digits, or hyphus. We need
a note saying we are taking about characters, not octets.
John Cowan cowan at ccil.org www.reutershealth.com www.ccil.org/~cowan
I come from under the hill, and under the hills and over the hills my paths
led. And through the air. I am he that walks unseen. I am the clue-finder,
the web-cutter, the stinging fly. I was chosen for the lucky number. --Bilbo
More information about the Ietf-languages