xx-XX-nnnn vs. xx-nnnn in Chinese and German

A. Vine andrea.vine@Sun.COM
Wed, 13 Feb 2002 13:35:01 -0800

Torsten Bronger wrote:
> I need de-AT/DE for the mapping on LaTeX identifiers.  LaTeX has to
> distinguish, because it generates some text.  E.g. the date:  "Januar" in
> Germany, "Jšnner" in Austria.  So if I write a letter in XML which is
> converted to LaTeX which then puts in the date -- the country of origin
> is essential.

Right.  There are important contexts for using the country id.

> > Then comes the problem of what to do when the country is known, but there's
> > nothing specific to that country in the text.  Is it better to tag it with
> > the country id, or to leave off the country id so that the text can be
> > better categorized as more generic?
> In this context: The RFC 3066 says that these tags should be interpreted
> as "one token".  I understand this so that a software should
> understand the whoule tag or nothing.  Is this a good approach?  If
> "fallbacks" were allowed, I'd see no problem with "overtagging" texts.

One token is needed because sometimes systems will do a simple match.  But what
that tag is used for can be quite diverse.  It might be used to determine which
font to select, which spell-checker to use, which date format to append, or what
category to put the data in.  In the last case, I was thinking about some sort
of information portal, which categorizes the texts to determine what might be
relevant to a user.  So an Austrian would get a list of texts with the de and
de-date tags as well as all de-AT texts.

It's just a consideration, not anything I propose to limit.

> Mmmh... what's wrong with the "canonical" approach?

As long as it's specified as below, I think it would be clear enough to work

>             Language   Subform   Orthography
> de           German       ?           ?
> de-DE        German    Germany        ?
> de-AT        German    Austria        ?
> de-DE-1996   German    Germany      "new"
> de-AT-1996   German    Austria      "new"
> de-DE-1901   German    Germany      "old"
> de-AT-1901   German    Austria      "old"
> de-1996      German       ?         "new"
> de-1901      German       ?         "new"
> "?" means: Dear software/reader, try to find it out, or use your
>            default.  That may sound a little bit arbitrary, but
>            someone who can't say more about their language than
>            just "de" can't expect more.
