Request: Language Code "de-DE-1996"
Wed, 24 Apr 2002 18:43:33 -0100
On 24 Apr 2002 at 10:20, Peter_Constable@sil.org wrote:
> From what I've been led to understand
> at this point, it seems that de-DE and de-1901-DE (or de-DE-1901) are
> probably synonymous, and more generally that if tags with -1901 and -1996
> are registered, then all of the de-XX tags will not be making any
> distinctions not otherwise made by other tags.
Defaults are not synonymes. They may act as a link to some existing value,
but this relationship is unidirectional. (Finally something that I feel
competent about! ;^))
de-DE means: German language, as spoken in Germany, orthography unknown.
de-1901-DE means the same, but with the 1901 orthography instead.
I agree with what Andrea Vine wrote:
orthography and regional vocabulary are pretty orthogonal.
But in the current system, we have only *one* value to contain language,
regional vocabulary, and orthography. We need a way to represent these
aspects combined. The current discussion is about the order or precedence
of elements, AFAICS.
Come to think of that, I agree with all what Andrea Vine wrote in that
posting (apart from the RFC 3066 demur, where I agree with Michael
Having a new system with de-1901 and its subtags, in combination with the
existing de-DE is certainly no an elegant solution. But IMHO it's the best
and cleanest we will get.
> My issue is one of interoperability: when I see data tagged in
> a certain way, how am I supposed to know what I can assume about what the
> author intended by that tagging? The *only* reason for adding tags to this
> registry is for purposes of interoperation. If we create a registry of tags
> but don't ensure that every party can know what the purpose of a tag is and
> know the intent of the author who used that tag, then what's the point of
> having this registry at all.
Since orthography and regional pecuilarities are IMHO orthogonal, we can't
get a clean tree structure. But we can get a tree structure with dead
branches, or a cyclic graph with links.
de German language (country and orthography unspecified)
de-DE German language, German vocabulary (orthography unspecified)
de-AT German language, Austrian vocabulary (orthography unspecified)
de-CH German language, Swiss vocabulary (orthography unspecified)
de-1901 German language, 1901 orthography (country unspecified)
de-1901-DE German language, 1901 orthography, German vocabulary
de-1901-AT German language, 1901 orthography, Austrian vocabulary
de-1901-CH German language, 1901 orthography, Swiss vocabulary
de-1996 German language, 1996 orthography (country unspecified)
de-1996-DE German language, 1996 orthography, German vocabulary
de-1996-AT German language, 1996 orthography, Austrian vocabulary
de-1996-CH German language, 1996 orthography, Swiss vocabulary
There is certainly a more formal and precise notation for this, but I hope
my intention is obvious.
When interpreting these tags, you get three values; when choosing these
tags, you can omit parts which you don't know or are uncertain, and still
produce a valid and unambigous tag.
> But if every distinct tag *does* have a
> distinct denotation and distinct purpose from every other, then let's make
> it clear what that else so that they *can* provide interoperability. That's
> all I'm asking.
If i misunderstood you and my posting was pointless, please exuse me
wasting your time with my posting.
> >Second, there may be texts with mixed orthographies
> >(the new one has received mixed acceptance).
> You're disagreeing, then, with the proposition that any given author writes
> any given doc with intent to follow only one convention?
At least I do - many journalists in germany are confronted with the
problem to use *three* different orthographies. The 1901 one, the 1996
one, and often a mixture between both which their publication made up to
remedy some of the most ugly and hilarious shortcomings of 1996, yet to
resemble 1996 somewhat.
In this situation, there is a need for precise language tagging in the
But you will also get documents where the author does not care about the
orthography (s)he uses, e.g. on travels or in interviews, knowing the text
will be corrected and arranged later.
None of these problems existed before the 1996 orthography regulation.
But since the decree of the Kultusministerkonferenz became effective, a
practical solution for this problem is needed.
> Requirements for cataloguing can differ
> from those of retrieval, but there is a connection between them: retrieval
> cannot be done in terms of finer distinctions than those that were used in
> cataloguing, but it should generally be possible to retrieve using queries
> that make less fine distinctions than were used in cataloguing.
I agree; does my above table work for this? If needed, I can provide the
matching mechanism, but I think it should be obvious.
> But that means, then,
> that orthography and vocabulary distinctions can be independent --
> restricted orthogonality (restricted in the obvious sense that one doesn't
> use Duden 1901 spelling with, say, Latin American Spanish vocabulary).
At least in this case, yes, I think so.
metabit * software and networks * heterogenous,distributed,generative
Fon:(+49)228/242488-0 * Fax: (+49)228/242488-7
address: Kurfürsten-11 * D-53115 Bonn * Germany