A proposed solution for descriptions (was: Re: ISO 639 - New item approved - N'Ko)

Sun Jun 11 06:13:27 CEST 2006

Mark Crispin <mrc at CAC dot Washington dot EDU> wrote:

> The problem is that you guys are trying to resolve conflicting desires 
> into a single name.  Long experience tells me that this doesn't work, 
> and ultimately forces the registry into wretched compromises that 
> displease everybody.

Richard Ishida <ishida at w3 dot org> wrote:

> In the case of the actual registry, there currently is no N'Ko ASCII 
> text, and one would have to type N&#x2019;Ko to get a match, knowing 
> the right code point to use, and how to represent that as an NCR. You 
> cannot google that by typing in N'Ko. I don't think that situation is 
> very helpful to the average user.

Originally I was opposed to adding new Description values to solve this 
problem, but Mark's and Richard's arguments have thoroughly convinced me 
that this is necessary, and isn't a slippery slope that would lead to 
dozens of Description strings for every subtag.  I stand corrected, and 
no, I don't mind being called a flip-flopper.

I hereby propose some changes to the Description fields of 28 existing 
records, based on the following issues that presented themselves more or 
less in this order.

1.  With the addition of N'Ko the language, the Registry now has 14 
subtag records with Description fields that include a non-ASCII 
character (and therefore a hex NCR).  I propose that for each of these, 
a corresponding ASCII-only Description be added.  Example: "N&#x2019;Ko" 
will be joined by "N'Ko".  This applies not only to apostrophes, but to 
all non-ASCII characters such as accented letters: "Volapük" will be 
joined by "Volapuk".  This solves most of the problem described by 
Richard.

2.  Conversely, those subtags that have a Description with an ASCII 
apostrophe should have a corresponding Description added with the 
appropriate non-ASCII directional apostrophe or modifier letter. 
Example: "Mi'kmaq" will be joined by "Mi&#x2BC;kmaq".  This should 
answer the concerns of Michael and others that a Description in "the 
correct characters" be available for all subtags.

3.  A few names (Gwich'in, Ge'ez) currently have the *wrong* non-ASCII 
apostrophe.  I propose that these be changed to a more appropriate 
character, as well as adding the pure-ASCII equivalent.  Example: 
"Gwich´in" will be deleted and two new Description fields, 
"Gwich&#x2BC;in" and "Gwich'in", will be added.  This also answers a 
concern raised by Michael.

4.  Some subtags were found to have a Description with a second name in 
parentheses, which is really an alternate name rather than a qualifier 
of the first name.  In the case of script subtag "Hano", the Description 
"Hanunoo (Hanun&#xF3;o)" already does what we are trying to achieve: it 
provides ASCII and non-ASCII equivalents for the same name.  This should 
be replaced by two new Description fields, "Hanunoo" and "Hanun&#xF3;o".

5.  Likewise for a Description like "Lepcha (R&#xF3;ng)", it doesn't 
make sense to repeat the "Lepcha" part simply to provide an ASCII and 
non-ASCII version of "Róng".  What would make sense would be to split 
this into three Descriptions: "Lepcha", "R&#xF3;ng", and "Rong".

6.  For that matter, any Description fields with an alternate name in 
parentheses (not a qualifier) should really be split into multiple 
Descriptions, regardless of whether non-ASCII characters are present. 
Example: "Falkland Islands (Malvinas)" should be split into "Falkland 
Islands" and "Malvinas".  This is what we did with language subtags, 
which are separated by semicolons in ISO 639: we converted them to 
multiple Description fields.  What I propose is that we do this 
consistently with scripts and regions as well.

Note that items 4 through 6 have no effect on Description fields where 
the parenthesized portion acts as a qualifier to the unparenthesized 
portion.  For example, "Cyrillic (Old Church Slavonic variant)" would 
NOT be split into "Cyrillic" and "Old Church Slavonic variant" since 
this would make no sense, and would give "Cyrl" and "Cyrs" the same 
Description.

7.  Finally, getting back to the apostrophe issue, it appears that the 
language Amis, represented by the grandfathered tag "i-ami", should not 
have an apostrophe at all.  This was listed as 'Amis in the RFC 1766 
registration form dating back to 1999, and so it was copied that way to 
the initial RFC 3066bis Registry, but apparently this was a typo or 
editing error.  I propose changing this to "Amis".

In a separate mail I will present proposed registration forms for all 28 
subtags that are affected in one way or another by these issues.  They 
are severable; each should be considered and discussed by the group on 
its own merits.  We aren't really constrained by time on this, but we 
should keep the discussion moving so that the appropriate changes (as 
agreed by the list) can be made to the Registry.

--
Doug Ewell
Fullerton, California, USA
http://users.adelphia.net/~dewell/