Duplicate Busters: Survey #2

Doug Ewell doug at ewellic.org
Sat Aug 2 03:43:21 CEST 2008


Frank Ellermann <nobody at xyzzy dot claranet dot de> wrote:

> [set 1]
>> The goal is to pick one and discard the other.
>
> For ASCII vs. non-ASCII "spelling" differences I'd
> doubt that this is a good goal.

You think it's a good thing to have both "Hanunoo" and "Hanunóo", or 
"Ge'ez" and "Geʻez"?

All right, I did ask for opinions, and yours counts as much as mine or 
anyone else's.  In the end the Reviewer will decide what to do with the 
Survey #2 results.  He may decide to change nothing at all.

> [set 2]
>> the description without comment is the ISO 639-1
>> and/or -2 name.
>
> IOW the "relevant" name for all Internet protocols,
> Web standards, etc. using RFC 1766, 3066, or 4646
> tags.  A quite significant number of existing tags.

Do the protocols depend on the exact name, or do they depend on the code 
element and meaning?

Remember that ISO changes these names too.  The vast majority of 639-2 
changes since October 2006 have been name additions and name changes.

>> Type: language
>> Subtag: ms
>> Description: Malay (macrolanguage)
>> Description: Malay
>
> If the 4646bis proponents invent some kind of scope
> field indicating "macrolanguage" the longer name is
> not strictly necessary.

There is a Scope field.  But if we keep "Malay" and "Malay (individual 
language)" for two separate subtags, my guess is that confusion will 
ensue.

> If they'd invent a flag (*) they could even indicate
> that this is not the main entry for Malay IFF there
> will be a new "individual" Malay dupe.

We have lots of special flags already; fortunately, most are not as 
obscure as a bare asterisk.

> I prefer the shorter description, assuming that the
> "macrolanguage" info is preserved elsewhere in the
> hypothetical registry.

Point noted.

>> Type: language
>> Subtag: ain
>> Description: Ainu (Japan)
>> Description: Ainu
>
> Here the longer description is better.  It might be
> good to preserve the currently relevant name somehow,
> how about a Comment ?  I skip similar cases.

You mean something like this?

Comments: Listed as "Ainu" in ISO 639-2

>> Type: language
>> Subtag: rup
>> Description: Macedo Romanian
>> Description: Macedo-Romanian
>
> That is stupid.  Pick the currently registered name,
> or convince ISO 638 to toss a coin.  Note what you
> have done manually in 4645bis.

Smile when you say that.  I've been complaining about this mindless 
discrepancy between 639-2 and 639-3 since at least 2007-08-16.  But 
there have been participants on both lists who have insisted that the 
precise ISO 639-2 name AND the precise 639-3 name must be kept intact, 
down to the last space or hyphen or &#x2BB;, or else it will not be 
possible to trace the BCP 47 subtag to the ISO 639 code elements.  This 
survey is basically a referendum on that insistence.

>> Type: script
>> Subtag: Ethi
>> Description: Ge&#x2BB;ez
>> Description: Ge'ez
>
> Keep both as you found them in the sources.

ISO 15924 lists only Ge&#x2BB;ez, not Ge'ez.  (Spelled with the real 
letter, of course, not the hex NCR.)

>> Type: script
>> Subtag: Hang
>> Description: Hangul
>> Description: Hang&#x16D;l
>> Description: Hangeul
>
>> Technically I should not be including Hangeul, which is
>> a different transcription of the same Korean word, not
>> a genuinely different name. Make your own judgment.
>
> Keep all as you found them in the sources.

Point noted.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ



More information about the Ietf-languages mailing list