More comments on latest draft
dewell at adelphia.net
Mon Jul 19 07:58:06 CEST 2004
Here are some additional comments on the current draft. Most of this
applies to both draft-04 and the editor's code of draft-05, available at
All of these comments assume the process of updating RFC 3066 is still
on track and not in danger of being derailed.
Section 2.2.3 says:
"An implementation that claims to be validating MUST:
"Specify the particular registry date for which the implementation
performs validation of subtags.
"Check that either the tag is a grandfathered tag, or that all language,
script, region, and variant subtags consist of valid codes for use in
language tags according to the IANA registry as of the particular date
specified by the implementation."
If validating implementations need to identify the date of the registry,
the registry file itself will needs to have a standard header -- or
something else easily recognizable and machine-readable -- that
identifies it. The alternative would be for implementations to scan the
entire registry for the latest date of any tag it contains, which is
very inefficient. Can Mark or Addison suggest such a header or other
Section 2.2.3 continues:
"If the processor generates tags, it MUST do so in canonical form,
including any supported extensions, as defined in Section 2.4.3."
So, if a user wants to generate a tag using a validating processor, say
for Indonesian as spoken in Timor-Leste (just a random example ;-) she
is perfectly free to specify "in-TP", but the processor will convert
this to "id-TL" before generating the output. Is that correct?
Section 3.1 says:
"If ISO 3166 were to assign the code 'QX' to represent the value "Isle
of Man" (represented in the IANA registry by the UN M49 code '833'),
'833' remains the canonical subtag and 'QX' would be assigned '833' as
an alias. This prevents tags that are in canonical form from becoming
Although this text has been around for a while, I suggest using "IM"
instead of "QX" as a more realistic example. "IM" is Exceptionally
Reserved by ISO 3166/MA and would be the obvious choice if they ever do
decide to assign a code to the Isle of Man. "QX" is in the
user-assigned block and it is pretty safe to assume it will never be
assigned, even by the supposedly reassignment-happy 3166/MA.
More from section 3.1:
"Stability provisions apply to grandfathered tags with this exception:
should all of the subtags in a grandfathered tag become valid subtags in
the IANA registry, then the grandfathered tag MUST be deprecated."
This still doesn't clearly answer what to do about existing "sgn-XX"
tags, where all subtags are ALREADY valid in the registry but still need
to be identified as to their originally registered identity.
I've added a "redundant" section to the prototype registry, where they
are registered as not needing to be registered :-) but I don't know if
that meets the need Peter Constable expressed about identifying "sgn-US"
unambiguously as American Sign Language instead of just any old sign
language used in the U.S.
Section 3.2 contains an example of the registry:
"# registered variants...
"variant; boont; Boontling; 2004-06-28; ; en #boont variant of English"
I am guessing that such self-explanatory comments like this would not
appear in the registry. I haven't included any such stuff except for
the "redundant" tags.
Section 3.2 continues:
"# The following codes were registered as complete tags, but can now be
"# composed of registered subtags and do not require registration.
"redundant; en-boont; Boontling; 2003-02-14; ; # see variant boont..."
At least something is now said about what to do with these tags. But I
imagine there will probably be some criticism about including records
for these in the registry.
Regardless, I have updated the prototype registry accordingly. The
comments have been changed to be a bit more complete and hopefully more
Appendix C of draft-05 says:
"As of January 2003, if a code exists in the associated ISO standard and
it is not deprecated or withdrawn as of that date, then it has
Thanks to Mark and Addison for changing the magic date back to January
2003 in draft-05, making CD and TL the canonical subtags for those
regions once again. I have updated the prototype to reflect this.
More information about the Ietf-languages