Request: Language Code "de-DE-1996"

Torsten Bronger bronger@physik.rwth-aachen.de
Fri, 26 Apr 2002 04:53:07 +0200


Halloechen!

Peter_Constable@sil.org wrote:

 > On 04/24/2002 06:28:40 PM "J.Wilkes" wrote:
 >
 >> [...]
 >
 > If I understand you, it seems that de-1901/de-1906 is overall the more
 > imporant distinction than de-DE/de-CH/etc. -- is that what you're intending
 > to convey? If so, that perhaps suggests de-1901-DE/de-1901-CH/etc over
 > de-DE-1901/etc.

It is probably more frequent that German speakers want to tag the
orthography variant than the country variant.  If they are aware of
subtags and tag their country, they certainly want to give the
orthography subtag, too.  Vice versa is probably not so sure.

I didn't think of de-1901-DE because I was influenced by XML and
associated things, which is certainly not the worst to be influenced
by.  I must object to de-1901-DE etc. for purely practical issues
(although principally there may be very good reasons for them).

The XML specification and RFC3066 suggest that the language code may
be immediately followed by a country code.  The very important
"en-GB"--"en-US" thing supports this assumption.  Most implementators
(including myself) realise that by some sort of "longest match".
Being afraid that subtags may follow that our programs can't cope
with, we try to match the *beginning* of the tag, i.e. we try to match
"de-DE", if it fails "de".  So we get as much information as we can.

Consequently a "de-1901-AT" would be interpreted by most applications
as "de", even if they recognised "de-AT".  And matching an "-AT" at an
arbitrary position is dangerous.

Additionally, in XSLT, the most important system for working with XML
documents, the function lang() is used to determine the current
language.  It does also do some sort of "starts-with?".

Another example: Mozilla, the alpha version of Netscape (well, sort
of), has a function to see the language of an HTML document with
right-mouse-click -- "Properties".  "de" yields "German", "de-DE"
yields "German (Germany)" and "de-DE-1901" yields "German (Germany,
1901)" which I found very delighting and shows how intuitive agreement
can work.  :-)

However, "de-1901-AT" is a plain "German (1901-AT)" and thus
misinterpreted.  I strongly think that not only this program behaves
this way.

To sum it up, I don't think that the advantages of de-1901-DE are big
enough to justify not to follow common practice.


Yet another thing: de-1901 and de-DE-1901 initiate different behaviour
in my program, but this point may be cleared up already ...

Tschoe,
Torsten.