Request: Language Code "de-DE-1996"
Tue, 23 Apr 2002 17:01:56 -0500

On 04/23/2002 08:44:57 PM "J.Wilkes" wrote:

>The German spoken (and written) in Germany, Austria and Switzerland
>differs not primarily in orthography, but in the words assigned to the
>Austrian "Obers" is German "Sahne" (for "cream"), e.g.
>Of course not all words are different, but enough to require different
>And yes, there are some orthographical differences as well...

My assumption: I gather from what you're saying, then, that people probably
wouldn't want to tag specifically orthography distinctions that are based
on country (i.e. orthography but not vocabulary), but they would want to
distinguish spellings according to the 1901 and 1996 conventions, and they
would want to distinguish data sets that use country-specific vocabulary
(and which will also follow either the 1901 or 1996 conventions). If that
is the case, then it would seem to me that what we need are


where de-1901 and de-1996 tell us what spellings are used, but don't
distinguish with regard to vocabulary, and where de-1901-xx and de-1996-xx
distinguish both vocabulary and spelling.

I suggest that we don't need to distinguish vocabulary without reference to
spelling: any given set of data that has country-specific vocabulary is
going to follow one orthographic convention or the other. That means that
what we don't actually need is de-DE, de-CH, etc. Of course, there is
surely existing data that is tagged this way. I would think it appropriate
to (a) discourage new use of these sequences, and (b) treat existing uses
as equivalent to de-1901-DE, de-1901-CH, etc. Either that, or if we want to
allow on-going use of de-DE, etc. explicitly state that these are
considered synonymous with de-1901-DE, etc. and distinct from de-1996-DE,

This all based on the assumption made above. I realise that I may still not
fully grasp the details.

>> If there *are* orthographic differences between the various countries,
>> it's fairly clear what kind of object and what specific instance of that
>> kind of object something like de-DE-1901 is intended to denote: German
>> spelled in Germany following conventions defined in 1901 (but not as
>> spelled in Germany using other conventions, and not as spelled in some
>> other country). But it is *not* clear what kind of object de-1901 is,
>> alone the identity of the specific instance. I question the usefulness
>> such ambiguous tags.
>If I encountered such a tag without having participated in this
discussion, I'd
>think de-
>1901 denotes a pretty generalized variant of German, following conventions

>in 1901.

I.e. pretty generic vocabulary, but 1901 spelling. Yes?

>When checking whether a given text should receive this tag, I would a)
>for the
>1901 orthography, and b) look for spelling or words that are specific to
one of
>three countries, and not common. If I would encounter such words, I would
>specific subtag instead, but if not, I'd leave it at de-1901. de-1901
would, in
>example, denote that this text can be understood in Germany, Austria and
>Switzerland all the same, without further adaption.

That fits precisely with what I describe above based on what I was assuming
(see above -- which seems to me to confirm that my assumption was based on
a correct understanding of the sociolinguist situation you were
describing). In the process you describe, precisely what you would *not*
end up ever specifying is vocabulary that is specific to one country yet
without specifying orthography.

Andrea: based on this input, I'd say to turn these details in my earlier
response to you around:

Is it really likely
that one will have data from which it can be determined that the spelling
follows 1901 conventions but it can't be determined which particular
country's spelling conventions were used?

Make it this: "Is it really likely that one will have data from which it
can be determined that the vocabularly was for country X but it can't be
determined whether 1901 or 1996 spelling conventions were being applied?"

- Peter

