draft-05: editorial comments (2)

Peter Constable petercon at microsoft.com
Fri Aug 27 07:10:47 CEST 2004

Some additional comments (all editorial, I think):

Throughout: The reference to ISO 639 is old; the current standard is ISO

Section 2.1: In the discussion of casing conventions, it might be
helpful to mention the convention for ISO 15924 as well; that way, the
three that are most relevant are all mentioned, rather than just two of
the three.

Section 2.2: "Note that registered subtags can only appear in specific
positions in a tag." The term "registered subtag" isn't defined anywhere
that I have noticed, and while I know what is in mind in this particular
case, there is some ambiguity between a subtag that's in the registry
because it was requested via the registration process versus any subtag
listed in the registry, period. This ambiguity was introduced by the
decision to list all possible subtags in the registry; when making that
change, I wonder if you also looked through the doc, esp. sections 2 and
3, to check for potential ambiguity.

Section 2.2, "All 2-character subtags in the IANA registry were
defined...": why "were"? They still *are* defined in the source
standard, and the semantics *are* defined in the source standard. Same
comment applies to the following bullet points. Even for the 4th bullet,
regarding subtags of 4 - 8 chars, these *are* defined in the registry,
not *were*. It's especially problematic to use the past tense in
conjunction with the non-past of "subsequently made" later on. (That
wording in RFC 3066 -- speaking as its contributor -- was specifically
intended to be open-ended into the future, not just from time of
publication of ISO 639 to time of publication of RFC 3066.)

Section 2.2: In the same set of bullets,

"The single character subtag "x" as the primary subtag indicates that
the whole language tag consists of private-use subtags." 

I would have worded it as follows:

"The single character subtag "x" as the primary subtag indicates that
the whole language tag is to be treated as privately defined."

For instance, a tag of 'x-afnoric-FR' happens to include a subtag (FR)
that isn't privately defined; the point of the 'x-' is simply that
parsers shouldn't assume *anything* about any of the content. As it is
now, the wording can give the impression that 'x-qaa-QM-Qaaa' would be
valid, but that 'x-afnoric', 'x-qaa-FR' etc would not be valid.

Section 2.2:

"At present all languages that have both kinds of 3-character code also
are assigned a 2-character code and hopefully future assignments of this
nature will not arise."

I think the "hopefully" clause can be worded more strongly:

"At present all languages that have both kinds of 3-character code also
are assigned a 2-character code; it is not expected that future
assignments of this nature will arise."

Section 2.2:

Note: In order to avoid versioning difficulties in applications such as
those experienced in RFC 1766[8], the ISO 639 Registration Authority
Joint Advisory Committee (RA-JAC) has agreed on the following policy

"After the publication of ISO/DIS 639-1 as an International Standard, no
new 2-letter code shall be added to ISO 639-1 unless a 3-letter code is
also added at the same time to ISO 639-2. In addition, no language with
a 3-letter code available at the time of publication of ISO 639-1 which
at that time had no 2-letter code shall be subsequently given a 2-letter

This will ensure that, for example, a user who implements "haw"
(Hawaiian), which currently has no 2-character code, will not find his
or her data invalidated by eventual addition of a 2-character code for
that language."

The comment about versioning difficulties experienced in RFC 1766 isn't
accurate: RFC 1766 did not support alpha-3 IDs from ISO 639-2, so the
issue being referred to could not have come up. The issue only arose
with the advent of RFC 3066.

I've suggested before that a better way to deal with this is simply to
limit the alpha-2 IDs to a fixed set, and since the reference to ISO
639-1 has now been made indirect (valid subtags are taken from the
registry, and the content of the registry is controlled), this is now
possible, and very simple. For instance, if the ISO 639/RA-JAC suddenly
changed their mind and added 'ha' for Hawaiian (for example), then the
maintainer of this registry could simply choose not to add 'ha' to the

If a reference to JAC policy is going to be quoted, it should be quoted
from a published source with the referecne included. I would quote the
following statement from this page

A language code already in ISO 639-2 at the point of freezing ISO 639-1
shall not later be added to ISO 639-1. This is to ensure consistency in
usage over time, since users are directed in Internet applications to
employ the alpha-3 code when an alpha-2 code for that language is not

Section 2.2: 'One of the grandfathered IANA registrations is
"i-enochian"...' Wouldn't the same apply to "i-mingo"? The statement
regarding enochian on its own struck me as a bit odd as it simply left
me wondering why this one item was being singled out.

Section 2.2: 'Registration of extended language subtags and non-standard
use MUST NOT be permitted.' Surely the appropriate wording here is
'...is not permitted.' Using "MUST NOT" suggests that the choice
regarding what kinds of registrations are permitted is potentially up to
users of the spec.

Section 2.2: "Example: In a future revision or update of this document,
the tag 'zh-min-nan'..." Insofar as we have realized 'min' is assigned
to an Indonesian language, this particular combination would be
semantically anomalous if recast as a non-grandfathered tag. Perhaps
'zh-gan' would be a better example.

BTW, I just noticed that example tags are enclosed sometimes in single
quotation marks but sometimes in double quotation marks. A consistent
convention should be used throughout.

Section 2.2: 'ISO 15924[2]--"Codes for the representation of the names
of scripts": alpha-4 script codes' -- the punctuation here is weird: the
colon looks like it should be inside the quotation marks. Suggested

'"Codes for the representation of the names of scripts" (alpha-4 script

Section 2.2: "All 4-character subtags were defined according to ISO
15924..." Same issue as above: use "are" rather than "were".

Section 2.2: 'All 2-character subtags following the primary subtag were
defined...' Same were > are issue. And again in subsequent bullet

Section 2.2: 'ISO 3166[4]--"Codes for the representation of names of
countries and their subdivisions - Part 1: Country codes"--alpha-2
country codes or assignments...' -- change to:

'ISO 3166[4], "Codes for the representation of names of countries and
their subdivisions - Part 1: Country codes" (alpha-2 country codes), or

Section 2.2 "...for Statistical Use[5] or assignments made thereto by
the governing standards body." Here the term used is "standards";
elsewhere it is "standardization". Please harmonize.

Section 2.2: "'de-Latn-CH' represents German written using Latin script
for Switzerland" (and subsequent examples): The wording "...for [country
X]" is both odd and not particularly meaningful. I think better would be
"...as used in [country X]". In addition, the first example could
further elaborate by adding, "(for example, using Switzerland-specific
vocabulary or spelling.)"

Section 2.2: "(Note: another way of saying this is that all subtags
following the singleton and before another singleton are part of the
extension. Thus in the tag "fr-a-Latn", the subtag 'Latn' does not
represent the ISO 15924 script code for Latin script.)"

First, I'd move the closing parenthesis to "...part of the extension.)
Thus, in the tag..."

Secondly, 'Latn' *could* represent the 15924 code for Latin if the
document specifying the "a" singleton so specified. So, I'd add the
following wording at the end: "...the subtag "Latn" does not represent
the ISO 15924 script code for Latin script unless that usage and its
interpretation in relation to the overall tag is specified by the
document that specifies the use of the singleton subtag "a".'

BTW, don't you need to limit possible singletons to exclude "y" or "z"?

Section 2.2: 'Use or standardization of the private use subtags is by
private agreement...' There's an oxymoron here. Delete "or

Section 2.4.3: 'None of the subtags in the language tag has a canonical
mapping...' So far, "canonical mapping" has not been defined. It is
mentioned in 3.1, but nowhere is it explained what the semantic
relationship is, or that the canonical mapping can be used to derive a
semantically-equivalent canonical tag from a tag containing a subtag
with a canonical mapping. This needs to be explained in 2.4.3.

Section 2.5: I'm satisfied with this discussion of private use

That's my comments for section 2.

Peter Constable

More information about the Ietf-languages mailing list