xx-XX-nnnn vs. xx-nnnn in Chinese and German

Torsten Bronger bronger@physik.rwth-aachen.de
Thu, 14 Feb 2002 22:39:20 +0100

Peter_Constable@sil.org wrote:

 > On 02/13/2002 07:24:28 PM Torsten Bronger wrote:
 >>Am Donnerstag, 14. Februar 2002 00:12 schrieben Sie:
 >>>At 21:57 +0100 2002-02-13, Torsten Bronger wrote:
 >>>>I need de-AT/DE for the mapping on LaTeX identifiers.  LaTeX has to
 >>>>distinguish, because it generates some text.  E.g. the date:  "Januar"
 >>>>Germany, "Jšnner" in Austria.  So if I write a letter in XML which is
 >>>>converted to LaTeX which then puts in the date -- the country of
 >>>>is essential.
 >>>The spelling of January in date formats is a locale issue not a
 >>>language tagging issue.
 >>Granted, but it's convenient to include this into the language tag.
 > It may be convenient, but it may perhaps also be confusing to have
 > distinct tags where there isn't a clearly documented basis for distinction
 > or any indication of the kinds of IT applications in which the distinction
 > may be appropriate.

I have too little experience with standardisation to comprehend that,
I'm afraid.  Sorry if I see it too trivially, but: The documented
basis for distinction is the normative guide where the respective
language is described, isn't it?  And the distinction is appropriate
where otherwise invalid output would be generated.

I'll give an example for what I mean.

I begin an XML letter with
   <letter xml:lang="de-AT">
and the XML-processor inserts a "31. Jšnner 2002", whereas
   <letter xml:lang="de-DE">
yields "31. Januar 2002" in the first line of the Postscript.

If the processor also created an envelope according to mailing system
rules of the FR Germany (e.g. with a barcode), *this* would be a real
locale issue, handled maybe by a configuration file.  But the date is
part of the letter and depends on the language, may it be generated or

Another example: A text in French must be printed with tiny spaces
before every "!" and "?".  In German and French there is no extra
space after a sentence.  Etc.

All these things are implicit behaviour of the processing software and
depend on a precise language tag.

To sum it up: "de-DE-1996" means "German using new orthography".  What
the software or whatever makes out of this info, is only its own
business.  It can perfectly ignore it, or use it to produce a decent
result.  But in my opinion the message itself is totally unambiguous.