Script codes in RFC 3066

John Cowan jcowan at
Wed Apr 9 14:12:44 CEST 2003

Caoimhin O Donnaile scripsit:

> I appreciate that.  My feeling, though, without being any kind of
> expert on the subject, is that the long term aim should be for
> a system of atomic (unstructured) codes together with an online
> database giving:
>   - hierarchic information for all extant codes

What kind of hierarchic information are you referring to: below the language
level (like en-us vs. en-ie) or above it?  If the latter, I agree; there is
no need to tag that.  But it is useful to see at once that en-us and en-ie have
some degree of interoperability (if we met, I'd probably understand your
English) which is appropriately expressed by a tag hierarchy.

> The database would thus enable searches for "all pages/records/books
> to do with Celtic languages, surviving or extinct", even though the
> classifications and codes for extinct Celic languages such as
> Gaulish and Leptonic may change as scholarship progresses.

This sounds like you mean "above the language level".  This is fine with
me: cel-gaulish rather than just gaulish is simply a concession to backward
compatibility, not any kind of principle.

> It sounds as if 3-character codes for languages would probably
> suffice, especially if numbers were allowed ("en1" for "Old English;
> "en3" for Middle English, etc?), but the lack of room in the codespace
> would mean that in many cases they would not be very mnemonic.

Agreed.  But there's not much mnemonic about existing Ethnologue codes.
Who'd guess ZPH for Totomachapan Zapoteco, or XML for Malaysian Sign Language?

> It seems to me that it would be a bad idea to have separate codespaces
> for living and extinct languages - Otherwise you get into arguments
> about whether languages such as Cornish, Manx, Classical Latin, and
> Medieval Latin are extinct or alive.


