New item in ISO 639-2 - Zaza

Thu Aug 24 05:25:19 CEST 2006

What appeals to me is most is #1 - and addressing the stability problem 
of macrolanguage inclusion by means of an online database of 
macrolanguage inclusions for both current and obsoleted codes.  
Applications would include an internal copy of the database which they 
would update from the Internet at intervals.

That is essentially what I was advocating in the old message below from 
2003.  At the time I was thinking of a fixed width code covering all 
extant and extinct languages and macrolanguages, but the fixed width 
aspect seems no longer feasible.

There was a little bit of debate at the time and it was pointed out
to me that hierarchies are not so great or generally applicable as I 
was imagining.  But overall I still like the idea of atomic codes and 
applications updating their inclusion tables from an online database.

Caoimhín

---------- Forwarded message ----------
Date: Wed, 9 Apr 2003 16:07:53 +0100 (BST)
From: Caoimhin O Donnaile <caoimhin at smo.uhi.ac.uk>
To: ietf-languages at alvestrand.no
Subject: Re: Script codes in RFC 3066

Mark said:

> Are you objecting to fact that language_code has structure or that
> language_subtags have structure? 3066 already has structure, and the fact
> that it does have structure is extremely important for compatibility.

I appreciate that.  My feeling, though, without being any kind of
expert on the subject, is that the long term aim should be for
a system of atomic (unstructured) codes together with an online
database giving:

  - hierarchic information for all extant codes

  - obsoleted codes together with any information on equivalent
    extant codes, and such hierarchic information as is still valid

The database would be built into browsers and other software and
an up-to-date version could be pulled in from the Internet in
standard format as often as desired.

The database would thus enable searches for "all pages/records/books
to do with Celtic languages, surviving or extinct", even though the
classifications and codes for extinct Celic languages such as
Gaulish and Leptonic may change as scholarship progresses.

It sounds as if 3-character codes for languages would probably
suffice, especially if numbers were allowed ("en1" for "Old English;
"en3" for Middle English, etc?), but the lack of room in the codespace
would mean that in many cases they would not be very mnemonic.

It seems to me that it would be a bad idea to have separate codespaces
for living and extinct languages - Otherwise you get into arguments
about whether languages such as Cornish, Manx, Classical Latin, and
Medieval Latin are extinct or alive.

Caoimhín