Request: Language Code "de-DE-1996"

Thu, 25 Apr 2002 14:43:50 -0500

On 04/24/2002 06:28:40 PM "J.Wilkes" wrote:

>> The choices may be orthogonal, but when an author cares about the ch=
oice
of
>> vocabulary, I'm saying that (s)he generally will also care about the=

choice
>> of orthography (though not vice versa in general). E.g. if you're
>> localising software for some particular region and have to make some=

>> vocabulary choices, aren't you forced to also make orthographic choi=
ces?
I
>> don't see how it can be otherwise.
>
>There are significant differences between de-1901 and de-1996; yet, th=
ere
>are texts that fit both orthographies. If the sentences are simple, an=
d a
>(long) lists of words in which the orthographies differ is avoided, a =
text
>can be indeed be correct in both orthographies. (Simple button/menu te=
xts
>in programs, for example - "=F6ffnen", "schlie=DFen", "speichern"; yet=

>regional words like "Kassa" etc. might be needed.)

I certainly understand that possibility and don't disagree.

>In general I agree with you, though. In most cases people will make a
>choice which orthography they use.

That's what seems likely to me. I would think that whenever you plan to=

make a product fit within certain vocabulary constraints, you will also=

actively plan to make it fit within certain orthography constraints. Th=
ose
orthography constraints may be specifically 1901, or specifically 1996,=
 or
you may specifically choose the vocabulary so that it's all 1901/1996
neutral. I'm saying I think these are active choices that are always ma=
de
when specific localised vocabulary choices are made. What I'm saying I
don't think people do when creating localised data sets is go to the ef=
fort
of constraining vocabulary to make it fit a particular domain, but then=
 not
care whatsoever about spelling. Maybe I'm wrong about what actually goe=
s on
in the L10n industry.

Note that this is a different matter from what they may want to *say* a=
bout
that data for cataloguing & retrieval purposes. I can see someone choos=
ing
to use 1901 spellings but then want to leave that unspecified in metada=
ta
tags so that queries for retrieval will not be sensitive to alternate
spellings. In general, though, it seems to me that ideally data should =
be
catalogued using the most specific criterion possible and that retrieva=
l
mechanisms always allow for less-specifically-constrained queries to
include that data. So, if I specifically choose 1901 spellings in data
tailored for a Swiss audience, I should be able to tag it with "de-1901=
-CH"
(or "de-CH-1901" -- whichever), and it should be retrievable by anyone
requesting "de", "de-1901", "de-CH" or "de-1901-CH". Of course, if I
specifically choose vocabulary that is neutral with regard to spelling,=

then I would tag it as "de-CH".

>I am uncertain whether my above example contradicts you here. I agree =
on
>that the writing system is a derivative notion of individual language.=
 But
>from my point of view, both orthography and regional peculiarities are=
, if
>not exactly orthogonal, at least not really hierarchical.

I'm seeing reasons why "regional peculiarities" might not fit
hierarchically. Othography < writing system < individual language is
certainly hierarchical, though.

The think that made me want to add a notion "domain-specific data sets"=

that is part of the hierarchy and derivative or orthography was precise=
ly
the reasoning above: in the L10n industry, if you constrain the data by=

choosing expressions to fit users in a specific domain, then it seems t=
o me
you will probably also be making orthography at least consistent with s=
ome
convention if not a convention particular to that domain. That just see=
ms
like an obvious QA issue: if a software application had controls to set=

colour in font, paragraph, and table-cell formatting dialogs, but label=
led
the controls "colour" in some cases and "color" in other cases (and was=

similarly ambivalent in help files), that would seem sloppy to a lot of=

users and reflect poorly on the product. It was that implication (if yo=
u
constrain A you will also be constraining B) that made me inclined to m=
ake
that part hierarchical (A is a derivative of B).

I also discuss sub-language varieties in another section of my paper, a=
nd
suggest that we might try to handle them in terms of the "domain-specif=
ic
data sets" simply with a view to keeping a simpler model. At that point=
, I
wasn't anticipating a need to refer to distinctions like vocabulary wit=
hout
regard to spelling distinctions, but I can see that that may be importa=
nt
for retrieval.

>No, if your perspective is the practical use for example, since all th=
ree
>countries share many silmilarities. Many texts written in German langu=
age
>are not specific to one country, since they don't contain country-spec=
ific
>words or phrases. In that case, the orthography may be more important;=

>e.g. for trade or legal documents (of which the EU produces a lot).

If I understand you, it seems that de-1901/de-1906 is overall the more
imporant distinction than de-DE/de-CH/etc. -- is that what you're inten=
ding
to convey? If so, that perhaps suggests de-1901-DE/de-1901-CH/etc over
de-DE-1901/etc.

- Peter

-----------------------------------------------------------------------=
----
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>
=