Script codes in RFC 3066

John Cowan jcowan at
Wed Apr 9 17:46:39 CEST 2003

Caoimhin O Donnaile scripsit:

> I was thinking mostly of hierarchic information above the language
> level - e.g. recording the fact that Scottish Gaelic is a Goidelic
> language, which in turn are a branch of the Celtic languages, which
> in turn are a branch of the Indo-european family.

The trouble is that while these specific facts are uncontroversial,
their equivalents elsewhere are often controversial.  Furthermore,
the identity of the level above IE, or even whether there is one, is
extremely controversial.

Furthermore, it's not clear what the utility of this database would be
for the bulk of applications.  For example, what useful properties do the
Celtic languages as such have in common?  If a client asks for Scottish
Gaelic, it might be tolerable to get Irish instead, but receiving Welsh
would be no better than receiving Danish.  And if the client wants Greek
and gets Albanian ...

> "Norwegian/Bokmal/Nynorsk", "English/Scots/Ulster-Scots",
> "Serbo-Croat/Serbian/Croatian/Bosnian".

IMAO, FWIW: Bokmal, Nynorsk, English, and Scots are separate; the rest are not.

>    - "is a sign language"
>    - "is an artificial language"
>    - "is believed to be extinct"
>    - "is a South American language"
>    - "is usually written in Cyrillic script"

Again, what can programs do differently if they know or don't know these
facts?  Cyrillic script and sign language (in written form) are a matter
of scripts; the others are just brute facts about languages, like
"is commonly spoken in Mogadishu".

John Cowan  jcowan at
"In the sciences, we are now uniquely privileged to sit side by side
with the giants on whose shoulders we stand."
        --Gerald Holton

More information about the Ietf-languages mailing list