Proposal: Language code "de-DE-trad"

Håvard Hjulstad havard@hjulstad.com
Mon, 11 Feb 2002 11:00:48 +0100


I find this discussion most interesting. Language tags are proposed to
capture the German orthographic revision. I can obviously see the need. The
question is how it should be solved.

(I am the convener of ISO/TC37/SC2/WG1 and the Project Editor of ISO 639-1
(FDIS to be circulated this month I have been told), and I am very much
interested in finding good ways to represent language variation. We are
going to propose a project to extend the 639 standard series to include such
issues.)

A different example: The two written Norwegian languages have undergone a
number of reforms during the last 150 years. Most times the changes are
small, but they still have an impact on dictionaries and spell-checkers. A
few times there have been fairly substantial changes. There is probably no
"nb-trad" or "nn-trad" ("nb" and "nn" being the approved identifiers for
Norwegian Bokmål and Norwegian Nynorsk respectively). There are, however, at
least an "nb-1917", an "nb-1937", an "nb-1958", and an "nb-1981". BUT, and
this is important, users of mordern Norwegian expect that spell-checkers
marked "nb" adhere to the LATEST orthography. In this case "nb-1981" should
be the unmarked "nb", while previous orthographies need to be somehow
marked. But then, when we get our next changes, the current "nb" needs to be
re-marked as "nb-1981", and a new "nb-2004" (or whatever) will be the
unmarked "nb". (In all fairness, the situation as described here seems
rather messy. It isn't quite that complicated. However, high-quality
spelling and grammar implementations need to take this into consideration.)

We need to look carefully at what is default and what is "marked". It would
be tempting to refer to "de-DE-1996" (or whatever the correct year should
be) as the default German for current language technology implementation.
Deviations from the default need to be indicated.

The fine analysis means that as defaults change, the actual value of an
unmarked language identifier (like "de", "en", "nb") changes over time. If
we don't accept that, unmarked language identifiers cannot be used at all.

I should think that the following approach would probably work in most cases
for the development of languages through time: The lastest approved form of
something is considered the default and is unmarked. Previous forms should
be marked with the time when it first was approved or taken into use
(actually: when that form became the default replacing a previous form).

Håvard

-------------------------
Håvard Hjulstad    mailto:havard@hjulstad.com
  Solfallsveien 31
  NO-1430  Ås, Norway
  tel: +47-64944233  &  +47-64963684
  mob: +47-90145563
  http://www.hjulstad.com/havard/
-------------------------