Request: Language Code "de-DE-1996"

J.Wilkes J.Wilkes
Wed, 24 Apr 2002 18:43:33 -0100


On 24 Apr 2002 at 10:20, Peter_Constable@sil.org wrote:

[...]
> From what I've been led to understand
> at this point, it seems that de-DE and de-1901-DE (or de-DE-1901) are
> probably synonymous, and more generally that if tags with -1901 and -1996
> are registered, then all of the de-XX tags will not be making any
> distinctions not otherwise made by other tags.

Defaults are not synonymes. They may act as a link to some existing value, 
but this relationship is unidirectional. (Finally something that I feel 
competent about! ;^))
 
de-DE means: German language, as spoken in Germany, orthography unknown.
de-1901-DE means the same, but with the 1901 orthography instead.

I agree with what Andrea Vine wrote:
orthography and regional vocabulary are pretty orthogonal.
References: <3CC5EF74.6A055E06@sun.com>

But in the current system, we have only *one* value to contain language, 
regional vocabulary, and orthography. We need a way to represent these 
aspects combined. The current discussion is about the order or precedence 
of elements, AFAICS.

Come to think of that, I agree with all what Andrea Vine wrote in that 
posting (apart from the RFC 3066 demur, where I agree with Michael 
Everson).


Having a new system with de-1901 and its subtags, in combination with the 
existing de-DE is certainly no an elegant solution. But IMHO it's the best 
and cleanest we will get.


[...]
> My issue is one of interoperability: when I see data tagged in
> a certain way, how am I supposed to know what I can assume about what the
> author intended by that tagging? The *only* reason for adding tags to this
> registry is for purposes of interoperation. If we create a registry of tags
> but don't ensure that every party can know what the purpose of a tag is and
> know the intent of the author who used that tag, then what's the point of
> having this registry at all.

Since orthography and regional pecuilarities are IMHO orthogonal, we can't 
get a clean tree structure. But we can get a tree structure with dead 
branches, or a cyclic graph with links. 

de				German language (country and orthography unspecified)
de-DE			German language, German vocabulary (orthography unspecified)
de-AT			German language, Austrian vocabulary (orthography unspecified)
de-CH			German language, Swiss vocabulary (orthography unspecified)
de-1901		German language, 1901 orthography (country unspecified)
de-1901-DE	German language, 1901 orthography, German vocabulary 
de-1901-AT	German language, 1901 orthography, Austrian vocabulary 
de-1901-CH	German language, 1901 orthography, Swiss vocabulary 
de-1996		German language, 1996 orthography (country unspecified)
de-1996-DE	German language, 1996 orthography, German vocabulary 
de-1996-AT	German language, 1996 orthography, Austrian vocabulary 
de-1996-CH	German language, 1996 orthography, Swiss vocabulary 

There is certainly a more formal and precise notation for this, but I hope 
my intention is obvious. 
When interpreting these tags, you get three values; when choosing these 
tags, you can omit parts which you don't know or are uncertain, and still 
produce a valid and unambigous tag.

> But if every distinct tag *does* have a
> distinct denotation and distinct purpose from every other, then let's make
> it clear what that else so that they *can* provide interoperability. That's
> all I'm asking.

If i misunderstood you and my posting was pointless, please exuse me 
wasting your time with my posting.
 

[...]
 
> >Second, there may be texts with mixed orthographies
> >(the new one has received mixed acceptance).
> 
> You're disagreeing, then, with the proposition that any given author writes
> any given doc with intent to follow only one convention?

At least I do - many journalists in germany are confronted with the 
problem to use *three* different orthographies. The 1901 one, the 1996 
one, and often a mixture between both which their publication made up to 
remedy some of the most ugly and hilarious shortcomings of 1996, yet to  
resemble 1996 somewhat. 
In this situation, there is a need for precise language tagging in the 
editorial workflow. 
But you will also get documents where the author does not care about the 
orthography (s)he uses, e.g. on travels or in interviews, knowing the text 
will be corrected and arranged later.

None of these problems existed before the 1996 orthography regulation.
But since the decree of the Kultusministerkonferenz became effective, a 
practical solution for this problem is needed.

[...]

> Requirements for cataloguing can differ
> from those of retrieval, but there is a connection between them: retrieval
> cannot be done in terms of finer distinctions than those that were used in
> cataloguing, but it should generally be possible to retrieve using queries
> that make less fine distinctions than were used in cataloguing.

I agree; does my above table work for this? If needed, I can provide the 
matching mechanism, but I think it should be obvious.

[...]
 
> But that means, then,
> that orthography and vocabulary distinctions can be independent --
> restricted orthogonality (restricted in the obvious sense that one doesn't
> use Duden 1901 spelling with, say, Latin American Spanish vocabulary).

At least in this case, yes, I think so.


Johannes Wilkes
-- 
metabit * software and networks * heterogenous,distributed,generative  
Fon:(+49)228/242488-0 * Fax: (+49)228/242488-7
address: Kurfürsten-11 * D-53115 Bonn * Germany