xx-XX-nnnn vs. xx-nnnn in Chinese and German

A. Vine andrea.vine@sun.com
Tue, 12 Feb 2002 12:00:48 -0800

Peter, et al,

This is a good conversation.  My thoughts:

Peter_Constable@sil.org wrote:
> On 02/11/2002 03:24:19 PM ietf-languages-admin wrote:
> >I take your point, Peter. However, that Chinese character
> >simplification reform was pushed forward in one country (CN) even
> >though also adopted in (SG) as well, I think, and naturally in (MO)
> >and (HK) too, as they are now part of (CN) though currently not much
> >used in (TW).
> The problem is that simplified characters can potentially be used in any
> of these countries, as can traditional characters. 

Not to mention the fact that the writing system starts to evolve separately in
the regions where it is used.  Maybe this is less of an issue as handwriting
starts to be replaced by typesetting on the computer, but even font designers
start creating separate glyphs.  As the glyphs and their associated meanings
diverge, a different character emerges.  And the country may dictate the
particular font to be used.  So, all these issues should be taken into account.

> I have heard mention of
> both traditional and simplified characters being used in in CN. If that's
> the case, what can we take zh-CN to mean? And if we take it to mean
> "Chinese" language written with simplified characters, then what of texts
> from that country written in traditional characters? And what do people do
> with data sets that use simplified characters but with content that
> relates specifically to some other context? Tags "zh-CN" and "zh-TW" are
> already creating problems for people, i.e. users in industry are saying,
> "That doesn't work for me." I'm not surprised that a muddled construct
> creates problems.
> >So my question is: are you saying that there should be legitimately
> >
> >        zh-1962 (if that was the date) as well as or instead of
> >        zh-cn-1962?
> I haven't suggested that, nor am I about to. Personally, I support the
> suggestion of having ISO 15924 tags to distinguish traditional from
> simplified Chinese characters, as I mentioned earlier.

And then the trick is, as has been discussed ad nauseum on the IDN list, what is
traditional and what is simplified.  Of course, this is much easier to determine
for a large block of text than for a name.

> >Taking that further, we have proposals
> > for
> >
> >        de-DE-1996
> >        de-AT-1996
> >        de-CH-1996
> >
> >Should we also have a proposal for
> >
> >        de-1996?
> >
> >If so, how would that differ from
> >
> >        de-DE-1996
> >        de-AT-1996
> >        de-CH-1996
> >
> >already proposed?

I believe it isn't unreasonable to leave off the country id.  Often, when
identifying language, the country of origin isn't known.  I am certain there are
English texts from the UK, Australia, New Zealand, India, and South Africa that
would be impossible to identify as originating from any one of those countries. 
I'm fairly certain the same is true for many other languages which are used in
more than one country.  So one might know the text is in German using the
spelling reforms of 1996, but have no idea if the text is Swiss or Austrian or
from Germany.  It's still useful to tag the text as de-1996, and possibly
politically incorrect to use de-DE-1996 as a default (or any of the other
country ids).

In contrast, there may very well be texts which are differentiated by country -
using certain phrases and words that are only used in that country, for
example.  In those cases it makes sense to use the country id.

Then comes the problem of what to do when the country is known, but there's
nothing specific to that country in the text.  Is it better to tag it with the
country id, or to leave off the country id so that the text can be better
categorized as more generic?

> Good question? How would it differ? What kind of entity is de-1996
> supposed to denote? *That* is the problem. What kind of category is it
> supposed to denote? We don't have any answer for that. But that is the
> kind of approach we have taken to now: assigning tags when we think we
> need some kind of distinction without any thought to what kind of entities
> it is that we are trying to distinguish. I for not would not support a
> registration of de-1996 until such questions are answered.

This is prudent.  It can make the registration process a bit more complex, but
in the end, the clarification is useful for all of us trying to figure out what
it is we should be doing.

iPlanet i18n architect
Sun Microsystems