Duplicate Busters: Survey #2

Sun Aug 3 14:57:39 CEST 2008

Doug Ewell wrote:

> You think it's a good thing to have both "Hanunoo"
> and "Hanunóo", or "Ge'ez" and "Geʻez"?

If both exist in the source, yes.  Note that I can't
read the last word in your question.  Apparently you
sent your mail as base64 UTF-8, this already limits
your audience.  My MUA (not the same as a year ago)
doesn't consider this as hostile, but my OS offers 
only a substitute glyph for "whatever it is".  

I'm not going to decode base64 into raw UTF-8, and
then UTF-8 into a hex. code point.  It is far better 
in the subtag registry, where I get the correct NCR,
and don't need to worry about anybody's fonts.

>> IOW the "relevant" name for all Internet protocols,
>> Web standards, etc. using RFC 1766, 3066, or 4646
>> tags.  A quite significant number of existing tags.

> Do the protocols depend on the exact name, or do
> they depend on the code element and meaning?

They are based on whatever 639-1/2 said, that is how
most 1766-3066-4646 language subtags are specified.

Any "redefinitions" in ISO 639-3 can be incompatible,
see the "zh" vs. "cmn" part of the zh-Latn debate.

Or the stunt to redefine "fy" some years ago, where
I fortunately never found any fy-DE or similar cases.

Limiting an existing code (Frisian to Western Frisian,
Chinese to Mandarin, Yugoslavia to Serbia, etc.) is
in theory always wrong.  In practice it might work...

> You mean something like this?
> Comments: Listed as "Ainu" in ISO 639-2

Yes, I'm not sure how important or helpful the info is.
The GG-IM-JE comment is a similar idea.

 [hyphenation of macedo romanian]
> there have been participants on both lists who have
> insisted that the precise ISO 639-2 name AND the
> precise 639-3 name must be kept intact, down to the
> last space or hyphen or &#x2BB;

Maybe that was me, maybe it was Debbie.  The solution
to pick one name in both source standards is fine.

> ISO 15924 lists only Ge&#x2BB;ez, not Ge'ez.

Then this ASCII Ge'ez has to be removed in the next
round of modifications.  How did a name get into the
registry if it is not in the source, did we miss the
change ?

> (Spelled with the real letter, of course, not the
> hex NCR.)

Looking at http://unicode.org/iso15924/iso15924-codes.html

 <td>Ethi</td> <td>430</td> <td>Ethiopic (Geʻez)</td>
 <td>éthiopien (geʻez, guèze)</td> <td>Ethiopic</td>

It turns out that my MUA is configured to use monospaced
fonts, where I can't read the character.  My browser can
use a proportional font where it works => I 𐇽 Unicode ;-)

 Frank