Request: Language Code "de-DE-1996"
Wed, 24 Apr 2002 10:20:46 -0500
On 04/23/2002 06:20:50 PM Martin Duerst wrote:
>I understand that it's important to define clearly what a tag
>can stand for, and what not, but I don't think your comments
>help too much.
I was interested to know what your response in particular would be. On
previous occasions, you have indicated to me how very important you felt it
is that we not have synonymous tags. From what I've been led to understand
at this point, it seems that de-DE and de-1901-DE (or de-DE-1901) are
probably synonymous, and more generally that if tags with -1901 and -1996
are registered, then all of the de-XX tags will not be making any
distinctions not otherwise made by other tags.
>The main point is that people want to tag
>data for many different purposes, and they only tag data
>if they see a purpose, and mostly only tag the data for
That's fine. My issue is one of interoperability: when I see data tagged in
a certain way, how am I supposed to know what I can assume about what the
author intended by that tagging? The *only* reason for adding tags to this
registry is for purposes of interoperation. If we create a registry of tags
but don't ensure that every party can know what the purpose of a tag is and
know the intent of the author who used that tag, then what's the point of
having this registry at all. But if every distinct tag *does* have a
distinct denotation and distinct purpose from every other, then let's make
it clear what that else so that they *can* provide interoperability. That's
all I'm asking.
>>My assumption: I gather from what you're saying, then, that people
>>wouldn't want to tag specifically orthography distinctions that are based
>>on country (i.e. orthography but not vocabulary), but they would want to
>>distinguish spellings according to the 1901 and 1996 conventions, and
>>would want to distinguish data sets that use country-specific vocabulary
>>(and which will also follow either the 1901 or 1996 conventions).
>People will want to do what they want. That may be very different
>things, depending on the application, and so on.
It seems to me like your saying is that interoperability isn't possible.
Then why are we bothering with a registry? But maybe I'm reading more into
your statement than you intend. I'm all for people making the distinctions
that they want (and not having to make distinctions that they don't want),
provided it can be done in a way that it's clear to everyone what
distinctions they do and don't intend so that interoperation is possible.
>For a web application (in particular for public administration,...),
>they would very much want to distinguish the vocabulary, because in
>the legal/public administration sector (another example would be food),
>there are huge differences between the three countries. Nobody would
>worry about the 1901/1996 difference, because they would just read
>over it. (Many people don't notice the differences when reading.)
If you're telling me that if you author web content and make it specific to
(say) Switzerland, you're going to be ambivalent about the spellings you
use, then I doubt that. I can see, though, that someone *looking* for
content may be more concerned about the vocabulary than the spelling.
>This is another proposal. In some cases, it may be a bit better than
>the one from Torsten, in others it's less appropriate. The difference
>is probably not big enough to spend too much time on discussing which
>one is better. And we definitely should avoid having more than two
I certainly wasn't intending to suggest another set in addition to those
propsed by Torsten. I was just suggesting a different morphology.
>>I suggest that we don't need to distinguish vocabulary without reference
>>spelling: any given set of data that has country-specific vocabulary is
>>going to follow one orthographic convention or the other.
>This is wrong. First, in many cases, there are no differences between
I'd suggest that an author always (except in pathalogical cases where
someone is creating an artificial counter-argument) creates a text with
intent fo follow some particular orthographic conventions, and that if a
text happens to contain content that conforms to both 1901 and 1996
conventions then that is coincidental.
>Second, there may be texts with mixed orthographies
>(the new one has received mixed acceptance).
You're disagreeing, then, with the proposition that any given author writes
any given doc with intent to follow only one convention?
>Third, there are cases
>where the users don't see a point to care which one it is. And we
>can't and shouldn't force them to care more than they are ready to do.
Yes, you're right in that regard. Requirements for cataloguing can differ
from those of retrieval, but there is a connection between them: retrieval
cannot be done in terms of finer distinctions than those that were used in
cataloguing, but it should generally be possible to retrieve using queries
that make less fine distinctions than were used in cataloguing.
>We have had the discussion before. It has been proposed that de-DE,...
>should refer to the current one. Now you propose it should be the
I had forgotten about the previous discussion. It just seemed to me that,
in terms of backward compatibility, it would have to be the old one. But I
guess your suggestion...
>I guess the best thing is to say that it refers to no
>particular one. That's how the registrations are currently worded.
could also be viable, since you've suggested why it can be useful to
specify vocabulary without specifying orthography. But that means, then,
that orthography and vocabulary distinctions can be independent --
restricted orthogonality (restricted in the obvious sense that one doesn't
use Duden 1901 spelling with, say, Latin American Spanish vocabulary).
>And what's even more likely
>is that people, for some application, are just interested in only one
>or the other aspect. That's their choice, not ours. I think this last
>point is the most important. We can only help people tag their data,
>we cannot force it.
Fair enough. I'm just interested in seeing that tags are interoperable --
i.e. that when John or Sally Doe tags their data in a certain way, that it
will be clear to the rest of the world what that tag is intended to reflect
and what can or cannot be assumed from it. No more, no less.
Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485