RFC 3066bis: Philosophical objection (harsh)

Sun Dec 28 19:55:38 CET 2003

Hi Harald,

Thanks for your comments back. I've been waiting for the -02 draft to be
posted. I don't know why it isn't posted yet, since Internet-Drafts
submitted after ours have been posted. Some of your concerns in your
original email and below have been, I think, addressed in that draft.

I don't think that we're the only two talking. Others have weighed in here
and there--six or eight folks. And there was a quite lengthy discussion this
past summer (you'll no doubt recall) in which the inclusion of ISO15924 was
thoroughly debated. There have even been quite a number of comments since
then along the lines of "Since we're all waiting for a new RFC..."

Well, Internet-Drafts don't write themselves, so Mark and I wrote this one
to collect all of that discussion together. I think there may be relative
agreement to at least the core of our proposal as a result and few people
will write in unsolicited to say "okay, I agree". There is also the holiday
season to contend with...

I have compressed your comments and removed most of mine in my response
below. This is a personal response and not necessarily indicative of Mark's
opinion.

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture.
It is not a feature.

> -----Original Message-----
> From: Harald Tveit Alvestrand [mailto:harald at alvestrand.no]
> Sent: samedi 27 decembre 2003 10:40
> To: aphillips at webmethods.com; ietf-languages at alvestrand.no
> Subject: RE: RFC 3066bis: Philosophical objection (harsh)
>
> >
> > Let's start with whole tag vs. subtag registration. Whole tag
registration
> > works well when there are only a very few exceptional tags expected or
> > when atomic tags completely cover the needs of the users.
>
> I would put this slightly differently: The whole tag registration works
> well when the one who wants to start using a tag (usually the tag
> generator) is willing to put work into the tag (registering) before
> starting to use it.
> Subtag registration works well when the recipient is willing to
> handle tag combinations where the recipient has no idea what they "mean",
and is
> satisfied with making educated guesses based on the identity of
> the subtag components.

Whole tag registration doesn't work well for recipients, as evidenced by the
fact that few implementations support them. My experience is that they are
more difficult to program for, since a parser by itself is easier to write
than a parser combined with a lookup table.

In any case, I disagree with your characterization of the difference between
tags that the recipient knows the meaning of and tags where the recipient
has no idea. The set of tags that an implementation recognizes and can do
something meaningful with is already smaller than the set of all possible
tags. Our proposal makes it easier for implementations to assign value or
meaning to the unrecognized set of tags. It also makes the structure of the
tags far more rigorous, making registered values of any sort (whole-tag or
subtag) more regular and thus easier to process.
>
> All implementations must be able to handle tags that they have not seen
> before. The maintenance of the table is a problem - but if that
> is the core
> problem, we should look at solving this problem (such as by defining a
> fixed format for the table that can be downloaded from IANA, for
> instance).
>
> But if generative tags are used, the two German ortographic variations
> wouldn't have caused eight "registrations" - they would have generated a
> near-infinite number of variations, most of which would be meaningless
> (no-CN-1905?)

Meaningless tags are with us already. They don't seem to cause a lot of
problems because, in practice, no one uses these legal but ridiculous codes.
In our draft, the onus is still on the tag generator (as termed above) to do
the work of registration. What is removed is the need to register many
variations and the need to convince the community to register multiple
levels of tags, if they are needed. This doesn't remove the need to convince
the community, provide supporting documentation, and so on. It does increase
the likelihood that a registrant will be able to register all of the tag
variations that they feel they need without "registration fatigue" setting
in.

It also will lead, I think, to the registered values actually being
implemented (at least on a rudimentary level, as supported "unrecognized
stuff") in browsers, XML parsers, Web servers, mail readers, and so forth. I
think that's an attraction: registered values actually can be used.
>
> > 4. "Silly subtag generation" should not be an issue. It has always been
> > possible to create 'silly' tags or at least tags with dubious
> > meaning with the generative mechanism. 'es-AQ', 'sv-CO', et cetera.
>
> Yes, and at times I think that the inclusion of the ISO 639 generative
> mechanism in RFC 1766 was a mistake, exactly for this reason.

It's here and it works. Let's not worry so much about Klingon for the
Neutral Zone or Norwegian for Chile. I think we should concentrate on having
a model that provides the right level of granularity and structure for the
job. Going with a table driven mechanism instead of a generative mechanism
would, I think, be a step backwards.
>
> > The description of
> > the registry in the draft is designed to capture the meaningful
> uses that
> > a subtag can be put to, without limiting the subtag's use in the
> > generative mechanism. Implementations might limit registered subtags to
> > their informative uses.
>
> But if there is no whole-tag registration, what is the hard rule
> that draws the distinction between "informative" and "non-informative"
uses?
> If there is a rule, we're really back with whole-tag registration.

There is no rule. I wrote "might" to indicate what an implementor might
decide to do. I actually think that very few implementations, if any, will
provide the ability to construct the user's own tag from subtags at random.
Instead they will allow users to choose from a list of predetermined tags or
to type in their favorite tag. Implementations that build more complex
mechanisms that do allow for generation of any combination of subtags might
therefore also be willing to commit to the extra effort of providing for or
restricting to informative use each registered subtag.

Again the question is where the burden should lie. With whole tag
registration, the burden is on the recipient to have an up to date table of
values and to deal with cases in which unexpected values are received (what
to do?). With subtags, the burden is on the sender to choose, but choose
wisely, the tag that best describes the content. The recipient can then
assign as much meaning as possible to the value, which may not be enough
(from the sender's perspective), but may be better than nothing at all (the
current result).
>
> > which basically say: "Use the most exact tag that you can, but no more
> > exact than is strictly necessary", which effectively says "use
> en-US, not
> > en-Latn-US". More guidance here might be provided...
>
> Saying something "effectively" often proves in practice to be not saying
> anything at all - people who do not understand the field will make the
> wrong choice unless the guidelines are 100% clear.

This was previously discussed at length (during the big blast of messages
earlier in the year). Most people felt that script codes belong before
country codes in tags. Strict tag matching compatibility between the draft
and RFC3066 would require moving the script code after the country code, at
the expense of moving them from the best semantic position.

> Whole-tag registration limits the number of people who can make
> mistakes to
> those who try to register tags. Subtag registration pushes the ability to
> make mistakes out to the implementors and users.

Already we have registrations that make tag choices unclear: 'zh' or
'zh-hans'? 'de' or 'de-1996'?

Another way of looking at subtags is that it changes the structure of the
registry, not of implementations. Yes, it allows silly tags to 'escape' into
the wild, but there has always been a burden on the user to choose the tag
wisely. With subtags at least the impact of a 'mistake' may be muted by the
fact that the recipient may look inside the tag (in a strictly mandated way)
to find information about the language.
>
> >
> > Extensions in general. We have contemplated adding rules to make
> > extensions default ignorable, but that seems overly limiting, at least
> > for a first pass. The extension mechanism we propose provides a way to
> > pass
> > language-related metadata in a more structured manner, and even in a
> > combinatorial manner (using two extension regimes together).
> Yes, this is
> > more complex than the current system and we could just stick
> with "value"
> > subtags for extensions. But we felt that kay/value provided a powerful
> > mechanism that could address some of the additional needs of specialized
> > communities without disturbing the base tags at all.
>
> It is a very powerful mechanism. It is also completely useless for open
> interchange without a registration mechanism or similar way to
> discuss the "meaning" of the extension.
>
> So I still don't understand this one:
> - If you want it for private exchange, why is it appropriate to use
> language tags, which are designed for open exchange?

Private extensions allow for very fine grained tagging, possibly of interest
to a small circle of users, while preserving the possibility of general
exchange. For example, a Web site that discussed details of dialect
variation might use extensions to label side-by-side examples, which might
then be styled differently using CSS, whereas the browser itself would only
need/see the base language tag to know what rendering rules to apply.

> - If you want it for standardized exchange, why don't you describe the
> registration work?

Because I think there may be several such extensions. I have one in mind,
but others will have different needs. Perhaps what is needed is a registry
for a 'namespace' for such extensions. That's a good idea for a future
draft...
>
> > Undefined Extensions. I envision that external groups with interest in
> > using the extension mechanism will define the keys and values. It just
> > didn't seem to make sense to me to saddle IANA with registering those
> > values. A separate registry for extensions or extension namespaces could
> > be created. I suppose we could add one...
>
> If external groups use it, they will either have to set up a registry or
> live with the risk of clashing definitions. Registries are cheap.

Agreed.
>
> >
> > I look forward to submitting draft-02 and to your comments on that
> > version.
>
> I will definitely comment. And I do hope that other people on the
> list will make their opinions known.

Me too.