More comments on draft-03

Addison Phillips [wM] aphillips at
Sun Jun 13 03:14:07 CEST 2004

Hi Doug,

Thanks for the comments. Notes follow.


Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force

Internationalization is an architecture. 
It is not a feature.

> -----Original Message-----
> From: Doug Ewell [mailto:dewell at]
> Sent: 2004年6月12日 14:47
> To: ietf-languages at
> Cc: Addison Phillips [wM]; Mark Davis
> Subject: More comments on draft-03
> Here are some more comments on draft-phillips-language-03, plus at least
> one comment that applies to the current delta between draft-03 and
> draft-04.  They aren't in any particular order, and technical and
> editorial comments are mixed recklessly.
> 1.  The note on "superseded" says:
> "Spelled 'superceded' as 'superseded', in deference to its Latin roots
> (J.Cowan)"
> Not to beat this too far into the ground, but "superceded" is a
> MISSPELLING, plain and simple, even if it is a very common one that some
> dictionaries have listed as a variant.  Latin roots have nothing to do
> with it.  The comment should simply read:
> "Corrected spelling of 'superseded'."

Whatever. A little humor never USED to hurt anything... I'll try to behave responsibly in the transient notes in the future :-).
> 2.  The text mentions a registered tag "i-hakka" in a few places, as
> something that would be grandfathered (section 2.2).  There is no such
> registered tag; the one that is registered is "i-hak".  Furthermore,
> "i-hak" is deprecated in favor of "zh-hakka", another registered tag.

Good catch. The tag is indeed zh-hakka. My bad.
> If the intent is use a deprecated tag as an example, to point out that
> all tags, even deprecated ones, will be grandfathered, then that should
> perhaps be spelled out a little more clearly to avoid confusion.  (But
> "i-hakka" is an error in any case.)

Some of the examples will need to use an i- tag such as enochian and some can use zh-hakka. I'll go through individually and triage them.
> If the use of a deprecated tag was unintentional and might be seen as
> distracting from the main point about the grandfathering mechanism, then
> perhaps another example such as "i-enochian" should be cited instead.
> 3.  Section 2.2 says, in relation to the extended language subtags:
> "In a future revision or update of this document, 'zh-min-nan' might
> represent the subdialect 'nan' of the Chinese dialect 'min'."
> "zh-min-nan" is already registered.  I would think this would be
> grandfathered as an RFC 3066 legacy tag, and would therefore be a
> confusing example of an extended language subtag.

I wanted to give a real example of how an ISO 639-3 (or similar) extended language subtag would look. This would be a good example if only ISO 639-3 were done and 'min' and 'nan' were actually codes in it. Perhaps a 'fake' example would be better. I think Peter got the jist of it, but I do agree that I also worried about using this as an example.

> 4.  The text makes several references to both "ISO 3166" (with a space)
> and "ISO3166" (without a space), and similar for the other ISO
> standards.  Likewise, both "RFC 3066" and "RFC3066" are seen (and even
> "rfc3066" in at least one spot).  These should be consistent,
> capitalized and with the space.

I'll canonicalize the references.
> 5.  I agree with John Cowan that if "-x-" is reserved to introduce
> private-use subtags, it seems unnecessary to define private-use subtags
> as beginning with the letter "x" as well.  If these are two different
> mechanisms, I can't see them and thus I think the text needs
> clarification; and if it's the same mechanism, then the "starting with
> x" restriction seems unnecessary.

Mark and I disagree. There is a difference between a private-use variant and a private use subtag: you know what role a variant fills in the tag. There is actually text in the language, script, and region subtags about preferring the private use codes in the underlying ISO standards over using extensions for exactly this purpose. Consider:

en-QM vs. en-x-myRegion

Processors can assign "QM" to the region field, whereas "myRegion" is an opaque code. That's why there are two mechanisms.
> 7.  I disagree with Harald Tveit Alvestrand that the successor to RFC
> 3066 should be delayed until the ISO 639/RA JAC comes up with an exact
> definition of how it allocates codes in various parts.  This is indeed
> Somebody Else's Problem, the JAC's.  The need to update RFC 3066 to
> incorporate script codes and to fix other issues should not wait,
> possibly years, for the JAC to provide this detail into their mechanism.

I don't believe Harald is actually suggesting that we wait. He used the terminology "RFC 3066ter" to indicate a follow-on (third version) to "RFC 3066bis", which is this one.
> 8.  The policy to allow all deprecated ISO codes, no matter how long ago
> they were deprecated, worries me.  I understand the desire to maintain
> "CS" for Czechoslovakia, and for that matter "TP" for East Timor (before
> it was recoded as "TL"), but if we want to allow these codes --as well
> as older relics like "FX" for Metropolitan France and "SK" for Sikkim
> and "DY" for Dahomey -- we will need to provide a reference to ISO
> 3166-3, Country Codes That Used To Exist But Don't Any More, which isn't
> available both publicly AND officially on the Internet.  As far as
> language codes are concerned, I know the ISO 639 Change Notice list
> includes "iw", "in", and "ji", but should these really be the places to
> look for allowable RFC 3066bis codes?

The existing RFC 3066 and existing practice actually allow the deprecated codes into tags. Although we recognize the problem of maintaining a tiny alpha2 namespace in an imperfect world, we also have to point out: content (which is where many language tags live) has a long shelf life. Making the langauge tags stable suggests that we have chosen the right course, which is that as of a certain date in the near future, the ISO xxx standards will be treated as stabilized by RFC 3066bis.

We may have to burden IANA with more registry work. Mark and I will have to consider some text here.
> I know the RFC can't include lists of all allowable codes, but the list
> of deprecated codes *as of the release of RFC 3066bis* is small and
> fixed -- about 40 country codes and 4 language codes.  I'd like to see
> the list of deprecated codes maintained in an IETF registry of some
> sort, similar to the list of registered subtags.  Otherwise it will be
> very difficult to conform with point 7(b) of Section 2.3 without having
> to look in obscure places and/or purchase ISO 3166-3.
> 9.  The section about registering subtags states:
> "Note: The purpose of the 'published description' is intended as an aid
> to people trying to verify whether a language is registered, or what
> language a particular subtag refers to."
> I think this should be augmented with an explicit statement that the
> "published description" requirement is *not* intended to exclude tiny
> minority languages or dialects, or those without standard orthographies.
> There seems to be quite a bit of misunderstanding about this (one of
> Linguasphere papers dragged it out again to show that ISO 639 is
> inadequate to represent spoken-only languages).

Agreed. Will put in something appropriate.

> -Doug Ewell
>  Fullerton, California

More information about the Ietf-languages mailing list