Request: Language Code "de-DE-1996"

Martin Duerst
Wed, 24 Apr 2002 08:20:50 +0900

Hello Peter,

I understand that it's important to define clearly what a tag
can stand for, and what not, but I don't think your comments
help too much. The main point is that people want to tag
data for many different purposes, and they only tag data
if they see a purpose, and mostly only tag the data for
that purpose.

At 17:01 02/04/23 -0500, wrote:

>My assumption: I gather from what you're saying, then, that people probably
>wouldn't want to tag specifically orthography distinctions that are based
>on country (i.e. orthography but not vocabulary), but they would want to
>distinguish spellings according to the 1901 and 1996 conventions, and they
>would want to distinguish data sets that use country-specific vocabulary
>(and which will also follow either the 1901 or 1996 conventions).

People will want to do what they want. That may be very different
things, depending on the application, and so on. For applications
such as spell-checking, they would most probably want to distinguish
1901/1996, and the difference between Swiss orthography and the rest.
For a web application (in particular for public administration,...),
they would very much want to distinguish the vocabulary, because in
the legal/public administration sector (another example would be food),
there are huge differences between the three countries. Nobody would
worry about the 1901/1996 difference, because they would just read
over it. (Many people don't notice the differences when reading.)

>If that
>is the case, then it would seem to me that what we need are
>where de-1901 and de-1996 tell us what spellings are used, but don't
>distinguish with regard to vocabulary, and where de-1901-xx and de-1996-xx
>distinguish both vocabulary and spelling.

This is another proposal. In some cases, it may be a bit better than
the one from Torsten, in others it's less appropriate. The difference
is probably not big enough to spend too much time on discussing which
one is better. And we definitely should avoid having more than two

>I suggest that we don't need to distinguish vocabulary without reference to
>spelling: any given set of data that has country-specific vocabulary is
>going to follow one orthographic convention or the other.

This is wrong. First, in many cases, there are no differences between
the orthographies. Second, there may be texts with mixed orthographies
(the new one has received mixed acceptance). Third, there are cases
where the users don't see a point to care which one it is. And we
can't and shouldn't force them to care more than they are ready to do.

>That means that
>what we don't actually need is de-DE, de-CH, etc. Of course, there is
>surely existing data that is tagged this way. I would think it appropriate
>to (a) discourage new use of these sequences,

I don't think this is appropriate.

>and (b) treat existing uses
>as equivalent to de-1901-DE, de-1901-CH, etc.

We have had the discussion before. It has been proposed that de-DE,...
should refer to the current one. Now you propose it should be the
old one. I guess the best thing is to say that it refers to no
particular one. That's how the registrations are currently worded.

>Is it really likely
>that one will have data from which it can be determined that the spelling
>follows 1901 conventions but it can't be determined which particular
>country's spelling conventions were used?
>Make it this: "Is it really likely that one will have data from which it
>can be determined that the vocabularly was for country X but it can't be
>determined whether 1901 or 1996 spelling conventions were being applied?"

Both are likely. It depends on the topic. And what's even more likely
is that people, for some application, are just interested in only one
or the other aspect. That's their choice, not ours. I think this last
point is the most important. We can only help people tag their data,
we cannot force it.

Regards,   Martin.