Request: Language Code "de-DE-1996"

Martin Duerst
Wed, 24 Apr 2002 12:08:30 +0900

At 17:59 02/04/23 -0500, wrote:

>On 04/23/2002 05:17:12 PM John Cowan wrote:

> >Yet it is possible to create two pairs of texts:
>Speaking hypothetically, there are a lot of potential distinctions that we
>*could* make, but many for which there is not a whole lot of real need.

This is not a hypothetical situation.

> >one pair is clearly
> >different in vocabulary but ambiguous as to orthography, the other
> >pair is clearly different in orthography but ambiguous as to vocabulary.
> >The second pair is fairly trivial, because the vocabulary differences
> >are not that large; the first pair might require more effort to
> >construct.

I'm not sure about that. The main problem is that you need somebody
who really knows all the details of the differences. Or you can take
somebody who doesn't know them well, and let them create a text with
mixed orthography!

>And my suggestion was that tags distinguishing the first pair are what
>probably are not needed: indicate a vocabulary distinction while remaining
>ambiguous regarding orthography. It seems to me in most reasonably likely
>scenarios, if people are creating a data set that follows certain criteria
>with regard to vocabulary, then they will also be assuming certain criteria
>with regard to orthography.

This assumes too much intentional nice wishes, which is not what happens
in practice. Most of the time, people do not intentionally create some-
thing according to a specific orthography. Also, once things are created,
they may not show all the intentions. Think about the differences between
en-US and en-GB, both in terms of vocabulary and of orthography. There
are tons of emails going over the Internet where the distinction in terms
of vocabulary OR in terms of orthography cannot be made. The situation
is very similar for German.

Regards,    Martin.