RFC 3066bis: Philosophical objection (harsh)

Sat Dec 27 19:40:12 CET 2003

Addison,
it seems that we are the only 2 people talking - this is worrisome to me....

--On 16. desember 2003 16:28 -0800 "Addison Phillips [wM]" 
<aphillips at webmethods.com> wrote:

> Hi Harald,
>
> I'm glad to see you've carefully examined our proposal. Your message
> brings up a number of points which require lengthy explanation.
>
> A new draft (draft-phillips-langtags-02.txt) will be sent in later today
> or tomorrow. This one should be substantially easier on the eyes, as it
> was made using Marshall Rose's XML DTD. It includes corrections to the
> non-substantive problems, such as the filename, ABNF, etc.
>
> Let's start with whole tag vs. subtag registration. Whole tag registration
> works well when there are only a very few exceptional tags expected or
> when atomic tags completely cover the needs of the users.

I would put this slightly differently: The whole tag registration works 
well when the one who wants to start using a tag (usually the tag 
generator) is willing to put work into the tag (registering) before 
starting to use it.
Subtag registration works well when the recipient is willing to handle tag 
combinations where the recipient has no idea what they "mean", and is 
satisfied with making educated guesses based on the identity of the subtag 
components.

> My objections
> to whole tag registration, which I think justify going to subtags, are:
>
> 1. It is hard to implement the registry as it sits because each tag is a
> 'holistic' value in an exceptions table. The most common implementations
> don't support registered values because it is very hard to maintain such a
> table.

All implementations must be able to handle tags that they have not seen 
before. The maintenance of the table is a problem - but if that is the core 
problem, we should look at solving this problem (such as by defining a 
fixed format for the table that can be downloaded from IANA, for instance).
>
> 2. The pattern of registrations follows a subtag structure which could
> benefit from the generative mechanism, given suitable guidelines (which we
> have sought to provide).
>
> 3. Whole tag registration requires many tags to be registered when the
> 'subtag' being added must cover a wide range of situations. Consider
> German orthographic variation, which has just two subtags and eight
> registrations. Consider if we were to build out the Chinese
> zh-hans/zh-hant set of registrations we would need more like 18
> registrations. Then there was Prof. Steenwijk's set of subtags for
> Resian. Here is one small language that might have 20 or 30 registrations
> (and only five subtags). If he can document these (and I suspect he can),
> then I don't see how we avoid a huge flock of exceptions like this. The
> registration regime practically requires that only a small number of tags
> be registered, leaving certain obvious tags "illegal".

But if generative tags are used, the two German ortographic variations 
wouldn't have caused eight "registrations" - they would have generated a 
near-infinite number of variations, most of which would be meaningless 
(no-CN-1905?)

> 4. "Silly subtag generation" should not be an issue. It has always been
> possible to create 'silly' tags or at least tags with dubious meaning with
> the generative mechanism. 'es-AQ', 'sv-CO', et cetera.

Yes, and at times I think that the inclusion of the ISO 639 generative 
mechanism in RFC 1766 was a mistake, exactly for this reason.

> The description of
> the registry in the draft is designed to capture the meaningful uses that
> a subtag can be put to, without limiting the subtag's use in the
> generative mechanism. Implementations might limit registered subtags to
> their informative uses.

But if there is no whole-tag registration, what is the hard rule that draws 
the distinction between "informative" and "non-informative" uses?
If there is a rule, we're really back with whole-tag registration.

> The draft does limit registered subtags in a significant way: you can't
> register a script or region code, only a variant or base language code
> (and it discourages base language codes). This effectively limits where
> and how registered subtags can appear in a tag and prevents random
> sequences from being generated. Users must still choose appropriate tags,
> but then they must do so even today.
>
> The results are very easy to parse/match/process, even without a current
> table of registered values. This should free implementations to provide
> better support for the registered values, while simplifying the number and
> type of registered values that must be handled.
>
> So I feel that going to subtags is actually a minor change (a policy
> change on the use of registered values, which are subtags in structure,
> if not in name) that provides for a simpler, more powerful way to use the
> registry to everyone's benefit.
>
> ---
>
> With regard to your other comments:
>
> Matching and Script. The text is careful not to match en-Latn-US to en-US.
> I'm not sure that's a good thing. If there is a valid use for script tags
> beyond the very narrow group of current registrations then the script
> codes must be put into the infrastructure. Mark and I preserved the strict
> right-to-left matching of RFC3066 and kept matching compatibility over
> semantic compatibility. This has some consequences, such as en-Latn-US and
> en-US not matching. At the same time, we have added to the matching rules,
> which basically say: "Use the most exact tag that you can, but no more
> exact than is strictly necessary", which effectively says "use en-US, not
> en-Latn-US". More guidance here might be provided...

Saying something "effectively" often proves in practice to be not saying 
anything at all - people who do not understand the field will make the 
wrong choice unless the guidelines are 100% clear.
Whole-tag registration limits the number of people who can make mistakes to 
those who try to register tags. Subtag registration pushes the ability to 
make mistakes out to the implementors and users.

> Year. The productive and/or non-productive use of years was experimental
> and was based on the German example, plus past proposals to register other
> values. We have removed this feature from draft-02 altogether. Note that
> this effectively prohibits the use of year subtags with the '####' pattern
> (since a registered variant must be five characters long and start with an
> alpha value).

The German example provided a case where someone needed it, and I think it 
was a valid usage.
It would probably be registered as "YNNNN", then, if anyone wants more of 
it.

> Key-Value Pairs. With regard to key-value pairs, the separator characters
> like equals were chosen for symmetry with various other protocols. We were
> aware of the potential collision of equals with that character's use in
> Accept-Language, and based on your and other's objections, draft-02
> replaces EQUAL SIGN with FULL STOP (dot).
>
> Extensions in general. We have contemplated adding rules to make
> extensions default ignorable, but that seems overly limiting, at least
> for a first pass. The extension mechanism we propose provides a way to
> pass
> language-related metadata in a more structured manner, and even in a
> combinatorial manner (using two extension regimes together). Yes, this is
> more complex than the current system and we could just stick with "value"
> subtags for extensions. But we felt that kay/value provided a powerful
> mechanism that could address some of the additional needs of specialized
> communities without disturbing the base tags at all.

It is a very powerful mechanism. It is also completely useless for open 
interchange without a registration mechanism or similar way to discuss the 
"meaning" of the extension.

So I still don't understand this one:
- If you want it for private exchange, why is it appropriate to use 
language tags, which are designed for open exchange?
- If you want it for standardized exchange, why don't you describe the 
registration work?

> Undefined Extensions. I envision that external groups with interest in
> using the extension mechanism will define the keys and values. It just
> didn't seem to make sense to me to saddle IANA with registering those
> values. A separate registry for extensions or extension namespaces could
> be created. I suppose we could add one...

If external groups use it, they will either have to set up a registry or 
live with the risk of clashing definitions. Registries are cheap.

> In particular, if we add the -x- separator, then users could presumably
> create private use variants after that separator with whatever value they
> desired. It seemed to me like a good idea to provide for some form of
> structure in the extensions and that 'keys' might at least define some
> form of namespace and reduce the liklihood of collision (as well as
> following good practice in labelling data).
>
> I look forward to submitting draft-02 and to your comments on that
> version.

I will definitely comment. And I do hope that other people on the list will 
make their opinions known.

                   Harald