Request: Language Code "de-DE-1996" / RFC 3066

J.Wilkes jwilkes@metabit.com
Fri, 26 Apr 2002 17:39:39 -0100


> On 04/25/2002 09:53:07 PM Torsten Bronger wrote:

>>  The very important
>> "en-GB"--"en-US" thing supports this assumption.  Most implementators
>> (including myself) realise that by some sort of "longest match".

Unfortunately, yes. Its easy to implement this way, but unclean and not valid. It would 
be if the tags were defined this way (first fragment language, second country, ...), 
but they are not. Not exactly.
With the current definition, the only way I know to interpret these tags is with a 
lookup table that matches all known (or needed) tags to in internal structure. Which 
is not *that* complicated.
A fallback function reducing the string by its last fragment and trying the remaining 
part until it matches will work, but I doubt its validity. 

On 26 Apr 2002 at 0:33, Peter_Constable@sil.org wrote:
> 
> My concern is that, when we extend mechanisms to gain new functionality, if
> we make individual decisions in terms of the limitations of existing
> implementations (that are limited in number and weren't designed with that
> extended functionality in mind), then we may saddle ourselves with
> complications that we regret in the long term.

I agree here. Unfortunately, we have only one single tag in which we try to package 
several pieces of information; so the way we do this is important.

[...]
> If most of us feel that maintaining the fullest compatibility in this
> regard with current software is the more important concern, though, then
> perhaps we must do what we must do. But perhaps we can consider this
> question: since the 1996 change already creates issues that require (human
> or software) processes to be revised, is the behaviour in relation to these
> existing software implementations that would result from having subtags
> that are probably better in a theoretical sense just one symptom that needs
> to be addressed?

I would prefer having a clean, mature specification. In the current scope, we have do 
deal with a legacy system which got extended, and has to be extended again.
If we had separate key - value pairs for each category, that would be better; here I 
agree with Thorsten (although that is certainly not an innovation of XML, a format I 
rather dislike).

We have to package the information in one single string of characters, 
and we have existing valid tags which should not be invalidated.
Any format definition more precise that "look it up in this table..." should 
accommodate for that.


RFC 3066 states in 2.2:
"There are no rules apart from the syntactic ones for the third and subsequent 
subtags"
In the same paragraph, definitions for the first and second subtag are given.

I ignore the behaviour of example software implementations, which may be faulty or 
not adhering to RFC 3066. Implementations have to follow standards, not vice versa. 
Otherwise there would be no sense in having standards.

>From the existing standard, I conclude it would be advisable to use the de-CH-1901 
variant, and allow for de-1901 as well.
Reasons:
- country codes in the second subtag are already defined.
- since UND should not be used except if forced by the application (2.2.5), we 
cannot simply allow de-UND-1901, so a separate de-1901 is needed.

The standard can be changed, of course.

And from my postings so far, you may remember I personally favour the de-1901-AT 
variant. Still I make above consideration.

> Keep in mind that *either way*, processses that match on initial substrings
> are going to face problems: if we adopt "de-1901-DE", then existing
> processes would fail to match for "de-DE", which we have said is needed.
> But if we adopt "de-DE-1901", then processes that operate only in terms of
> initial substrings will fail to match on "de-1901", which we have also said
> is important (and may before long be the more important). In other words,
> such processes are eventually going to fail us, either way.

I agree.
Alas, the current standard is unclear either way: one cannot rely on the order alone, 
nor is the subtag value alone sufficient. All in all, the RFC 3066 and its predecessors 
are inconvenient from a software implementation point of view. The structural 
requirements in this RFC are just restrictions but do not help interpretation; to get a 
clean interpretation, a full LUT (look-up table) is needed (and has to be maintained).
As Peter Constable writes:
[...]
> the tag should not be assumed to mean anything until it is registered 
> -- it might never be registered. 
[...]

Is this the appropriate forum for discussing a revision of that RFC? I was under the 
impression it is not, just for registering subtags. If it is, I'd be glad to contribute to a 
revision of RFC 3066.

> If the result of this dialog were that we came to a consensus
> to adopt "de-1901-DE" instead, then "de-DE-1901" would remain undefined.

That is the orthogonality problem we encounter here. 

It is desirable to have a language tagging method that allows for all the aspects 
discussed. For example, "language=de;year=1901;country=AT" would do, defining 
the order of the fragments separated by ";" as irrelevant. 

In the current system,  we just have to set some order. At random, by preference, 
and in consensus.

If anyone wants to go in detail about my example, or an revision of RFC 3066, 
please set a different topic or direct me to a place/mailinglist more appropriate for 
that. I'd like to keep this discussion here a bit on topic.

I think it's better to get tags like de-AT-1901 or de-1901-AT registered soon and 
revise the standard separately, so these tags can be put to use already.


> But I have a hunch that, if you (and industry as a whole) are serious about
> i18n / L10n / etc., then eventually you'll be revising your implementation
> in ways that could allow de-1901 and de-1901-DE to do likewise.

I think it would be wise to have a different format / standard, if the industry is serious 
about i18n/L10n etc.

> I hope you understand I'm not meaning to pick on your proposals in
> particular. They just happened to be what came along first to strech the
> envelop in relation to some of the issues I've been exploring with
> long-term concerns in mind. I realise my comments and the resulting
> discussion are probably resulting in some delay in registration of tags for
> the distinctions you want, but hopefully that delay won't be protracted,
> and hopefully it will

While I am all in favour of discussing things until the result can be called mature, 
only human beings should be allowed to take 18 years for that.

>From my point of view, this RFC 3066 is messed up, and in existing use. It needs an 
extension or a parallel standard, but I don't see how the problems we are talking 
about can be *solved* within the given limits of RFC 3066. We can try to find a 
workable solution for now, and take up the process of designing a different / 
additional format in parallel.

Johannes Wilkes 
-- 
metabit * software and networks * heterogenous,distributed,generative  
Fon:(+49)228/242488-0 * Fax: (+49)228/242488-7
address: Kurfürsten-11 * D-53115 Bonn * Germany