Duplicate Busters: Survey #2
nobody at xyzzy.claranet.de
Sun Aug 3 14:57:39 CEST 2008
Doug Ewell wrote:
> You think it's a good thing to have both "Hanunoo"
> and "Hanunóo", or "Ge'ez" and "Geʻez"?
If both exist in the source, yes. Note that I can't
read the last word in your question. Apparently you
sent your mail as base64 UTF-8, this already limits
your audience. My MUA (not the same as a year ago)
doesn't consider this as hostile, but my OS offers
only a substitute glyph for "whatever it is".
I'm not going to decode base64 into raw UTF-8, and
then UTF-8 into a hex. code point. It is far better
in the subtag registry, where I get the correct NCR,
and don't need to worry about anybody's fonts.
>> IOW the "relevant" name for all Internet protocols,
>> Web standards, etc. using RFC 1766, 3066, or 4646
>> tags. A quite significant number of existing tags.
> Do the protocols depend on the exact name, or do
> they depend on the code element and meaning?
They are based on whatever 639-1/2 said, that is how
most 1766-3066-4646 language subtags are specified.
Any "redefinitions" in ISO 639-3 can be incompatible,
see the "zh" vs. "cmn" part of the zh-Latn debate.
Or the stunt to redefine "fy" some years ago, where
I fortunately never found any fy-DE or similar cases.
Limiting an existing code (Frisian to Western Frisian,
Chinese to Mandarin, Yugoslavia to Serbia, etc.) is
in theory always wrong. In practice it might work...
> You mean something like this?
> Comments: Listed as "Ainu" in ISO 639-2
Yes, I'm not sure how important or helpful the info is.
The GG-IM-JE comment is a similar idea.
[hyphenation of macedo romanian]
> there have been participants on both lists who have
> insisted that the precise ISO 639-2 name AND the
> precise 639-3 name must be kept intact, down to the
> last space or hyphen or ʻ
Maybe that was me, maybe it was Debbie. The solution
to pick one name in both source standards is fine.
> ISO 15924 lists only Geʻez, not Ge'ez.
Then this ASCII Ge'ez has to be removed in the next
round of modifications. How did a name get into the
registry if it is not in the source, did we miss the
> (Spelled with the real letter, of course, not the
> hex NCR.)
Looking at http://unicode.org/iso15924/iso15924-codes.html
<td>Ethi</td> <td>430</td> <td>Ethiopic (Geʻez)</td>
<td>éthiopien (geʻez, guèze)</td> <td>Ethiopic</td>
It turns out that my MUA is configured to use monospaced
fonts, where I can't read the character. My browser can
use a proportional font where it works => I 𐇽 Unicode ;-)
More information about the Ietf-languages