Script codes in RFC 3066

Wed Apr 9 12:06:41 CEST 2003

Caoimhin O Donnaile wrote on 04/09/2003 06:46:29 AM:

> Are 3-letter codes going to be enough to cover all languages, including
> all historical languages from all periods of history, all "new"
> languages which may arise from reclassifications and obsoletion of old
> codes, all dialects which may get promoted to "language" status
> following more detailed study, and all dialects which may be
> conveniently treated as languages for information processing purposes?
> I don't know - I am just asking.

I believe so, given that the estimate of currently-spoken languages is
around 7000 and the total number of alpha-3 combiniations is >17,000.

> I think that language codes should be "atomic", not hierarchic,
> because opinions on hierarchies are likely to change.

That is also my inclination -- e.g. I have been less than completely
enthusiastic about the sgn- registrations -- though I think this issue is
not as critical as some others that need to be worked through.

> The hierarchic information should be handled by a database.

Again, I'm inclined to agree. Part of the reasoning is that the hierarchies
we face do not consititute simple tree structures. This became apparent to
me when we considered de-ch-1996 etc: both country IDs and dates needed to
be parts of tags in order to deal with spelling and vocabulary differences,
but the vocabulary and spelling distinctions are orthogonal, and so
left-substring parsing to determine matches (e.g. de-ch is requested and
the available content is de-ch-1996, so there is a match) does not deal
with all scenarios we face: if de-1996 is requested but the content is
tagged de-ch-1996, there will not be a match using a left-substring parsing
mechanism.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485