xx-XX-nnnn vs. xx-nnnn in Chinese and German

Torsten Bronger bronger@physik.rwth-aachen.de
Wed, 13 Feb 2002 21:57:58 +0100

On Dienstag, 12. Februar 2002 21:00 schrieben Sie:
> > >Taking that further, we have proposals
> > > for
> > >
> > >        de-DE-1996
> > >        de-AT-1996
> > >        de-CH-1996
> > >
> > >Should we also have a proposal for
> > >
> > >        de-1996?
> > >
> > >If so, how would that differ from
> > >
> > >        de-DE-1996
> > >        de-AT-1996
> > >        de-CH-1996
> > >
> > >already proposed?
> I believe it isn't unreasonable to leave off the country id.=20
> [...]
> In contrast, there may very well be texts which are differentiated by
> country - using certain phrases and words that are only used in that
> country, for example.  In those cases it makes sense to use the country=

I need de-AT/DE for the mapping on LaTeX identifiers.  LaTeX has to=20
distinguish, because it generates some text.  E.g. the date:  "Januar" in
Germany, "J=E4nner" in Austria.  So if I write a letter in XML which is
converted to LaTeX which then puts in the date -- the country of origin
is essential.

> Then comes the problem of what to do when the country is known, but the=
> nothing specific to that country in the text.  Is it better to tag it w=
> the country id, or to leave off the country id so that the text can be
> better categorized as more generic?

In this context: The RFC 3066 says that these tags should be interpreted
as "one token".  I understand this so that a software should
understand the whoule tag or nothing.  Is this a good approach?  If
"fallbacks" were allowed, I'd see no problem with "overtagging" texts.

> > Good question? How would it differ? What kind of entity is de-1996
> > supposed to denote? *That* is the problem. What kind of category is i=
> > supposed to denote? We don't have any answer for that. But that is th=
> > kind of approach we have taken to now: assigning tags when we think w=
> > need some kind of distinction without any thought to what kind of
> > entities it is that we are trying to distinguish. I for not would not
> > support a registration of de-1996 until such questions are answered.
> This is prudent.  It can make the registration process a bit more compl=
> but in the end, the clarification is useful for all of us trying to fig=
> out what it is we should be doing.

Mmmh... what's wrong with the "canonical" approach?

            Language   Subform   Orthography
de           German       ?           ?
de-DE        German    Germany        ?
de-AT        German    Austria        ?
de-DE-1996   German    Germany      "new"
de-AT-1996   German    Austria      "new"
de-DE-1901   German    Germany      "old"
de-AT-1901   German    Austria      "old"
de-1996      German       ?         "new"
de-1901      German       ?         "new"

"?" means: Dear software/reader, try to find it out, or use your
           default.  That may sound a little bit arbitrary, but
           someone who can't say more about their language than
           just "de" can't expect more.