Request: Language Code "de-DE-1996"
Wed, 24 Apr 2002 10:43:05 -0500

On 04/23/2002 06:34:12 PM "A. Vine" wrote:

>Sorry, I just don't follow this logic.  The vocabulary of a region/country
>the orthography rules seem pretty orthogonal to me.

The choices may be orthogonal, but when an author cares about the choice of
vocabulary, I'm saying that (s)he generally will also care about the choice
of orthography (though not vice versa in general). E.g. if you're
localising software for some particular region and have to make some
vocabulary choices, aren't you forced to also make orthographic choices? I
don't see how it can be otherwise.

>My prior point was this:
>We won't have 2 tags, one for language (however it may be defined), and a
>one for orthography (however it may be defined), or for that matter, a
third tag
>for script/writing system, not in the near future.
>Content-language: de
>Content-orthography: 1996      {maybe someday}
>Content-script: Latin

I'm not suggesting that we do. In my paper, I propose a model in which
writing system is a derivative notion of individual language, and
orthography is a derivative notion of writing system. So, you don't ever
specify an orthography without specifying (whether implicitly or
explicitly) a particular writing system, and a particular language. These
distinct notions are not orthogonal.

>So all are contained in one tag,

Certainly the things related to ID of the individual language, the
particular writing system and the particular orthographic conventions are.

>I doubt anyone will move backend tags to the front,
>so a simple tag of "scouse" is unlikely.

I'm having to interpret "backend tags". I understand you to me that you
don't expect anyone to take non-initial sub-tags and use them as initial
sub-tags (or as the only sub-tag).

>The convention among taggers and tag
>readers is to back up the language tag, so de-DE backs to de, and
de-DE-1996 is
>likely to become de-1996

That's the step that surprised me.

>So, for example, there is the tag "de", which means all that has been
>by anyone is that the text is in German, with no info regarding which
German or
>which orthography.  Then there's de-DE, which is German in Germany, with
no info
>on the orthography.  Then there is de-1996, which is German but no info
>one, but a definitive determination that it is in the 1996 orthography.
>into the de-DE-1996, de-AT-1996, de-CH-1996 tags,  I assume (and those
>with the orthography rules for both or all of the German versions can
confirm or
>deny) that there is a good amount of commonality, to the point where one
>create German text which is clearly in the new orthography but is still of
>indeterminate German.

You're convincing me of the need for vocabulary and orthography to be kept
as independent. In the model I proposed in the paper, I assumed that
vocabulary choices would always imply orthography choices. I still believe
that, at the point of authoring, that is true, but I can see that there may
be situations in which tags must be added to existing data and it can be
determined that certain vocabulary choices were made but not that
particular orthographic choices were made. Martin has also pointed out that
someone retrieving data may care about vocabulary but not orthography (vice
versa I had already assumed).

So, the model I proposed needs to be revised: it needs to be a little more
complex and a little less restrictive where the relationship between
vocabulary and orthography distinctions are concerned.

>It all could be clearer, it's true, but with the mechanism and the
>"languages" and "territories", I think this might be the best we can do.

Well, the traditional mechanism of "languages" and "countries" is clearly
not adequate, as evidenced by the registration request that sparked this

>It's probably insufficient for linguists and scholars, but plenty
sufficient for the
>average user.

I'm not particular after something needed by linguists and scholars. I'm
trying to sort out what industry as a whole needs, which I guess equates as
much as anything can to the average user.

- Peter

Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <>