Another update to registry

Doug Ewell dewell at adelphia.net
Fri Oct 22 22:13:43 CEST 2004


This week ISO 639-2/RA announced a change to the name of the language
represented by the alpha-2 code "si", from just plain "Sinhalese" to
"Sinhala; Sinhalese".  Accordingly, I've updated the proposed IANA
language subtag registry, replacing "Sinhalese" with "Sinhala".

One of the unfortunate aspects of the registry being specified in RFC
3066bis as a semicolon-delimited text file is that there is no provision
for descriptions that contain a semicolon, and ISO 639-2/RA seems to be
doing this more and more often.  Many of the recently added language
names consist of two or more names separated by a semicolon:

Filipino; Pilipino
Classical Newari; Old Newari
Klingon; tlhIngan-Hol
Blin; Bilin
Crimean Tatar; Crimean Turkish
Limburgish; Limburger; Limburgan
Low German; Low Saxon; German, Low; Saxon, Low
Church Slavic; Old Slavonic; Old Church Slavonic; Church Slavonic; Old
Bulgarian
etc.

IMHO, the latter two border on the ridiculous; it's probably not
necessary to offer every possible permutation of a multi-word name.

Nevertheless, even though we know the names of languages in ISO 639 are
not normative (only the codes are), it would still be nice for the full
ISO 639 name (including multiple parts) to be used in the registry.  But
because of the semicolon-delimited format, only one of the multiple
names can be chosen.  I've chosen the first name in each case rather
than being arbitrary about it.  The semicolons can't be simply replaced
by commas; that would wreak havoc on the "Low German" example above.

The only alternative I can think of would be to use quotation marks to
enclose multi-part names that contain semicolons:

language; si; "Sinhala; Sinhalese"; 2004-07-06; ;

but of course this would require extra processing.

Somewhat related to this, I've also replaced semicolons with commas
whenever they appear within comments.  This prevents lines like:

region; BU; Myanmar; 2004-07-06; MM; # changed 1989-12-05; formerly
Burma

from being parsed as seven fields instead of six.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/




More information about the Ietf-languages mailing list