New item in ISO 639-2 - Zaza
Caoimhin O Donnaile
caoimhin at smo.uhi.ac.uk
Thu Aug 24 05:25:19 CEST 2006
What appeals to me is most is #1 - and addressing the stability problem
of macrolanguage inclusion by means of an online database of
macrolanguage inclusions for both current and obsoleted codes.
Applications would include an internal copy of the database which they
would update from the Internet at intervals.
That is essentially what I was advocating in the old message below from
2003. At the time I was thinking of a fixed width code covering all
extant and extinct languages and macrolanguages, but the fixed width
aspect seems no longer feasible.
There was a little bit of debate at the time and it was pointed out
to me that hierarchies are not so great or generally applicable as I
was imagining. But overall I still like the idea of atomic codes and
applications updating their inclusion tables from an online database.
Caoimhín
---------- Forwarded message ----------
Date: Wed, 9 Apr 2003 16:07:53 +0100 (BST)
From: Caoimhin O Donnaile <caoimhin at smo.uhi.ac.uk>
To: ietf-languages at alvestrand.no
Subject: Re: Script codes in RFC 3066
Mark said:
> Are you objecting to fact that language_code has structure or that
> language_subtags have structure? 3066 already has structure, and the fact
> that it does have structure is extremely important for compatibility.
I appreciate that. My feeling, though, without being any kind of
expert on the subject, is that the long term aim should be for
a system of atomic (unstructured) codes together with an online
database giving:
- hierarchic information for all extant codes
- obsoleted codes together with any information on equivalent
extant codes, and such hierarchic information as is still valid
The database would be built into browsers and other software and
an up-to-date version could be pulled in from the Internet in
standard format as often as desired.
The database would thus enable searches for "all pages/records/books
to do with Celtic languages, surviving or extinct", even though the
classifications and codes for extinct Celic languages such as
Gaulish and Leptonic may change as scholarship progresses.
It sounds as if 3-character codes for languages would probably
suffice, especially if numbers were allowed ("en1" for "Old English;
"en3" for Middle English, etc?), but the lack of room in the codespace
would mean that in many cases they would not be very mnemonic.
It seems to me that it would be a bad idea to have separate codespaces
for living and extinct languages - Otherwise you get into arguments
about whether languages such as Cornish, Manx, Classical Latin, and
Medieval Latin are extinct or alive.
Caoimhín
More information about the Ietf-languages
mailing list