draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

Mark Davis mark.davis at jtcsv.com
Thu Jan 6 18:02:58 CET 2005

>3066) that go beyond the patterns 'll(-CC)" and "lll(-CC)". If we stick
with RFC 3066, we will have no way of writing forward-compatible
processors that will be able to do very useful matching.

I want to reinforce what Peter has said. In RFC 3066 we have already
registered language tags like zh-Hans, and zh-Hant. Nobody can parse out the
script in the language tag because RFC 3066 does not provide for
identification of the pieces. During the development of 3066bis, we have
been holding off on registering all of the country variants of these,
because we didn't want them to be redundant with the generated codes in
3066bis. If we don't get 3066bis, then we will end up needing to register
the combinations zh-Hans-CN, zh-Hant-CN, zh-Hans-HK, zh-Hant-HK, zh-Hans-MO,
zh-Hant-MO, zh-Hans-SG, zh-Hant-SG, zh-Hans-TW, zh-Hant-TW. And zh is just
one example. There are many languages that can be written in different
scripts, where it is important as a matter of practice to be able to
distinguish the script as well as the country.

There are very good reasons to have the script code before the country code,
because differences by script swamp differences by country. Suppose that you
are composing a web page by pulling together different pieces of data, and
your target is Chinese simplified for Hong Kong. For one of those data
sources, there is not an exact match. Given a choice between a data source
in Chinese simplified, or a data source in Chinese Hong Kong (but
traditional), you really want to pick the Chinese simplified. That is
reflected in the use of the script value second (zh-Hant-HK), so that the
common process of truncation will get the right result.

This is similar to the reason why the language code comes before the country
code. If we had the order CH-fr, then we could end up mixing French and
German in the same page, because we would fall back (for one of the data
sources) from CH-fr to CH, which could be German.


----- Original Message ----- 
From: "Peter Constable" <petercon at microsoft.com>
To: <ietf-languages at alvestrand.no>; <ietf at ietf.org>
Sent: Thursday, January 06, 2005 07:42
Subject: RE: draft-phillips-langtags-08, process, sp
ecifications,"stability", and extensions

> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of ned.freed at mrochek.com

> Again, your pejorative dismissal of other people's concerns does not
> mean your position is valid...

> Parsing almost never is. But simply parsing these tag is not, and
> has
> been, the issue.

I think you guys are in violent agreement over country codes within a
tag, and that the debate over intrepreting the wording of RFC 3066
serves no purpose.

I think the intent of Mark's dismissal has been to refute
perceived-invalid objections, in which case we need to consider that the
line between perceived-invalid and truly-invalid has been blurred simply
by the volume of discussion (the noise factor). There have been some
invalid objections that bear some similarity to comments Ned has made as
he has tried to make his point. (E.g. Bruce Lilly has claimed invalid
back-compat problems on the incorrect premises that RFC 3066 does not
permit ISO 3166 country codes except as second subtags or does not
permit second subtags that are not country codes (at the moment I forget
if it was one or the other or both).)

But Ned's concerns are legitimate, I think. I'd say they are not
necessarily blocking issues for this draft, because I think a possible
outcome of discussion is to characterize them as concerns about
outstanding issues that need to be solved rather than as concerns over
the draft itself; but I do think they are valid concerns that deserve

In a nutshell, Ned was elaborating on a comment from Dave Singer that,
once we have parsed a pair of tags and identified all the pieces, it's
not a trivial matter to decide in every case how the two tags compare,
and that there are factors that would exist if the draft were approved
that didn't exist under RFC 3066.

Again, I think this is a question that deserves discussion. In relation
to the proposed draft, I don't see it as a particular problem with the
draft. It is a problem that doesn't exist in RFC 3066, but that is only
because RFC 3066 left us with bigger problems: it doesn't give us any
way to identify pieces that we would be encountering in registered tags
(apart from hard-coded tables compiled from versions of the registry
that pre-exist a given implementation).

RFC 3066 permits tags that have all kinds of internal structures. That
is a problem as it will never allow us to derive much useful information
from a tag with any confidence -- only the ISO 639 language category and
in some cases a country category. I predict that in the future we will
be seeing a significant number of tags (whether sanctioned without
registration by a successor to RFC 3066 or as tags registered under RFC
3066) that go beyond the patterns 'll(-CC)" and "lll(-CC)". If we stick
with RFC 3066, we will have no way of writing forward-compatible
processors that will be able to do very useful matching.

What this draft does is impose some order to all the other patterns
within  tags that are permitted, and tell us what the different pieces
must be. As a result, we have more named pieces to deal with, and we are
presented with the question that Ned raised: "Now we have more named
pieces than we did before; what do we do with them?" That is a problem
that will need to be addressed. But I don't think it's a reason to
oppose the draft, since opposing the draft (or at least opposing any
revision that introduces a richer internal structure) leaves us in a
situation that must be characterized either as a worse problem or as
turning our backs on increased functionality to meet real user needs.

Peter Constable
Ietf-languages mailing list
Ietf-languages at alvestrand.no

More information about the Ietf-languages mailing list