xx-XX-nnnn vs. xx-nnnn in Chinese and German
Torsten Bronger
bronger@physik.rwth-aachen.de
Wed, 13 Feb 2002 21:57:58 +0100
On Dienstag, 12. Februar 2002 21:00 schrieben Sie:
> > >Taking that further, we have proposals
> > > for
> > >
> > > de-DE-1996
> > > de-AT-1996
> > > de-CH-1996
> > >
> > >Should we also have a proposal for
> > >
> > > de-1996?
> > >
> > >If so, how would that differ from
> > >
> > > de-DE-1996
> > > de-AT-1996
> > > de-CH-1996
> > >
> > >already proposed?
>
> I believe it isn't unreasonable to leave off the country id.=20
> [...]
>
> In contrast, there may very well be texts which are differentiated by
> country - using certain phrases and words that are only used in that
> country, for example. In those cases it makes sense to use the country=
id.
I need de-AT/DE for the mapping on LaTeX identifiers. LaTeX has to=20
distinguish, because it generates some text. E.g. the date: "Januar" in
Germany, "J=E4nner" in Austria. So if I write a letter in XML which is
converted to LaTeX which then puts in the date -- the country of origin
is essential.
> Then comes the problem of what to do when the country is known, but the=
re's
> nothing specific to that country in the text. Is it better to tag it w=
ith
> the country id, or to leave off the country id so that the text can be
> better categorized as more generic?
In this context: The RFC 3066 says that these tags should be interpreted
as "one token". I understand this so that a software should
understand the whoule tag or nothing. Is this a good approach? If
"fallbacks" were allowed, I'd see no problem with "overtagging" texts.
> > Good question? How would it differ? What kind of entity is de-1996
> > supposed to denote? *That* is the problem. What kind of category is i=
t
> > supposed to denote? We don't have any answer for that. But that is th=
e
> > kind of approach we have taken to now: assigning tags when we think w=
e
> > need some kind of distinction without any thought to what kind of
> > entities it is that we are trying to distinguish. I for not would not
> > support a registration of de-1996 until such questions are answered.
>
> This is prudent. It can make the registration process a bit more compl=
ex,
> but in the end, the clarification is useful for all of us trying to fig=
ure
> out what it is we should be doing.
Mmmh... what's wrong with the "canonical" approach?
Language Subform Orthography
de German ? ?
de-DE German Germany ?
de-AT German Austria ?
de-DE-1996 German Germany "new"
de-AT-1996 German Austria "new"
de-DE-1901 German Germany "old"
de-AT-1901 German Austria "old"
de-1996 German ? "new"
de-1901 German ? "new"
"?" means: Dear software/reader, try to find it out, or use your
default. That may sound a little bit arbitrary, but
someone who can't say more about their language than
just "de" can't expect more.
Bye,
Torsten.