Request: Language Code "de-DE-1996"

John Cowan jcowan@reutershealth.com
Tue, 23 Apr 2002 18:07:12 -0400 (EDT)


Peter_Constable@sil.org scripsit:

> Then the question I have is just what kind of object is this referencing,
> and what use is this ID? If you're saying that it's identifying a
> particular orthography, and that there are no orthographic differences
> between the different countries, then that makes me ask what kind of object
> de-DE-xx is intended to denote? It can't be making an orthography
> distinction if there are no orthographic differences between the various
> countries. And if there *are* orthographic differences, then of what use is
> de-1901? 

This set of questions triggered an "aha!" event in my cortex.  RFC 3066
language tags are *not* fundamentally identifiers for the abstract
objects called "languages", "writing systems", "orthographies", or
what have you.  Instead, they are used to attribute language, writing
system, orthography, or what have you to concrete documents, whether
instantiated in stone, pulp, CD, or rotating magnetic media.

For example, consider the mini-document "Ich will nicht".  This might
be validly tagged de, de-de, de-1901, de-de-1901, de-1996, de-de-1996,
or indeed as Austrian or Swiss variants as well.   There is no knowing
which one it is a priori.

But if one knows (based on context) that it is German and in the 1901
orthography, then the tag de-1901 makes sense.

> But how is such a notion really useful? You can't
> use it to pick out a spelling checker. I suppose it could be used in
> retrieving data if someone was looking for data in any of these related but
> distinct orthographies, but that seems like too much of an edge case.

It is appropriate if that's all you know.  There are plenty of not so
mini documents that are clearly in traditional orthography, but don't
happen to hit any of the (primarily) lexical differences between de-at
and de-de.  (De-ch is more different because it does not use es-zet.)

> This highlights one of the reasons for the model I propose: we have had a
> practice of suggesting tags without making clear what *kind* of object the
> tag is intended to identify,

Tags do not identify objects; they are used to attribute properties
to objects (i.e. documents).

-- 
John Cowan <jcowan@reutershealth.com>     http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_