Request: Language Code "de-DE-1996"
Thu, 25 Apr 2002 14:43:50 -0500
On 04/24/2002 06:28:40 PM "J.Wilkes" wrote:
>> The choices may be orthogonal, but when an author cares about the ch=
>> vocabulary, I'm saying that (s)he generally will also care about the=
>> of orthography (though not vice versa in general). E.g. if you're
>> localising software for some particular region and have to make some=
>> vocabulary choices, aren't you forced to also make orthographic choi=
>> don't see how it can be otherwise.
>There are significant differences between de-1901 and de-1996; yet, th=
>are texts that fit both orthographies. If the sentences are simple, an=
>(long) lists of words in which the orthographies differ is avoided, a =
>can be indeed be correct in both orthographies. (Simple button/menu te=
>in programs, for example - "=F6ffnen", "schlie=DFen", "speichern"; yet=
>regional words like "Kassa" etc. might be needed.)
I certainly understand that possibility and don't disagree.
>In general I agree with you, though. In most cases people will make a
>choice which orthography they use.
That's what seems likely to me. I would think that whenever you plan to=
make a product fit within certain vocabulary constraints, you will also=
actively plan to make it fit within certain orthography constraints. Th=
orthography constraints may be specifically 1901, or specifically 1996,=
you may specifically choose the vocabulary so that it's all 1901/1996
neutral. I'm saying I think these are active choices that are always ma=
when specific localised vocabulary choices are made. What I'm saying I
don't think people do when creating localised data sets is go to the ef=
of constraining vocabulary to make it fit a particular domain, but then=
care whatsoever about spelling. Maybe I'm wrong about what actually goe=
in the L10n industry.
Note that this is a different matter from what they may want to *say* a=
that data for cataloguing & retrieval purposes. I can see someone choos=
to use 1901 spellings but then want to leave that unspecified in metada=
tags so that queries for retrieval will not be sensitive to alternate
spellings. In general, though, it seems to me that ideally data should =
catalogued using the most specific criterion possible and that retrieva=
mechanisms always allow for less-specifically-constrained queries to
include that data. So, if I specifically choose 1901 spellings in data
tailored for a Swiss audience, I should be able to tag it with "de-1901=
(or "de-CH-1901" -- whichever), and it should be retrievable by anyone
requesting "de", "de-1901", "de-CH" or "de-1901-CH". Of course, if I
specifically choose vocabulary that is neutral with regard to spelling,=
then I would tag it as "de-CH".
>I am uncertain whether my above example contradicts you here. I agree =
>that the writing system is a derivative notion of individual language.=
>from my point of view, both orthography and regional peculiarities are=
>not exactly orthogonal, at least not really hierarchical.
I'm seeing reasons why "regional peculiarities" might not fit
hierarchically. Othography < writing system < individual language is
certainly hierarchical, though.
The think that made me want to add a notion "domain-specific data sets"=
that is part of the hierarchy and derivative or orthography was precise=
the reasoning above: in the L10n industry, if you constrain the data by=
choosing expressions to fit users in a specific domain, then it seems t=
you will probably also be making orthography at least consistent with s=
convention if not a convention particular to that domain. That just see=
like an obvious QA issue: if a software application had controls to set=
colour in font, paragraph, and table-cell formatting dialogs, but label=
the controls "colour" in some cases and "color" in other cases (and was=
similarly ambivalent in help files), that would seem sloppy to a lot of=
users and reflect poorly on the product. It was that implication (if yo=
constrain A you will also be constraining B) that made me inclined to m=
that part hierarchical (A is a derivative of B).
I also discuss sub-language varieties in another section of my paper, a=
suggest that we might try to handle them in terms of the "domain-specif=
data sets" simply with a view to keeping a simpler model. At that point=
wasn't anticipating a need to refer to distinctions like vocabulary wit=
regard to spelling distinctions, but I can see that that may be importa=
>No, if your perspective is the practical use for example, since all th=
>countries share many silmilarities. Many texts written in German langu=
>are not specific to one country, since they don't contain country-spec=
>words or phrases. In that case, the orthography may be more important;=
>e.g. for trade or legal documents (of which the EU produces a lot).
If I understand you, it seems that de-1901/de-1906 is overall the more
imporant distinction than de-DE/de-CH/etc. -- is that what you're inten=
to convey? If so, that perhaps suggests de-1901-DE/de-1901-CH/etc over
Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485