RFC3066 bis: use of ISO 639-1

Sat Dec 6 00:08:29 CET 2003

One issue that has been a concern over the past few years has been with
the fact that ISO 639 two-letter codes have precedence, and there are
languages for which there is now a three-letter code but not a
two-letter code, though potentially a two-letter code could be added in
the future.

E.g. today Hawaiian data is tagged as "haw" because there ISO provides a
three-letter ID but not a two-letter ID, and they don't have a
two-letter ID because Hawaiian doesn't satisfy the criteria for ISO
639-1; but it's a possibility that some day Hawaiian might satisfy those
criteria, and a two-letter ID added. If that were to happen, then all of
a sudden all existing data tagged with "haw" would become incorrectly
tagged.

To avoid this, the relevant ISO committee was asked to make a commitment
that no two-letter ID would ever be added for languages that already had
a three-letter ID but not a two-letter ID. That's really not a good
approach, IMO, for a couple of reasons:

- It is a reasonable progression as a language develops to come into
scope for ISO 639-2, and then possibly later to come into scope for ISO
639-1.

- There's nothing to guarantee that a future ISO committee won't choose
to do what the existing committee was asked not to do.

For that reason, I suggest that the use of ISO 639-1 two-letter codes be
frozen to precisely those that exist today. That would mean revising
section 2.2 to say that 2-character primary subtags are limited to a
list, which could be provided in an appendix.

Peter

Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division