Duplicate Busters: Survey #2
Doug Ewell
doug at ewellic.org
Sun Aug 3 21:32:52 CEST 2008
Frank Ellermann <nobody at xyzzy dot claranet dot de> wrote:
>> You think it's a good thing to have both "Hanunoo"
>> and "Hanun?o", or "Ge'ez" and "Ge?ez"?
>
> If both exist in the source, yes. Note that I can't
> read the last word in your question. Apparently you
> sent your mail as base64 UTF-8, this already limits
> your audience. My MUA (not the same as a year ago)
> doesn't consider this as hostile, but my OS offers
> only a substitute glyph for "whatever it is".
<ot>
First, I've told Outlook Express 6.0 not to base64-encode my text (Tools
| Options | Send | Plain Text Settings | Encode text using: None). The
header on my sent message says "Content-Transfer-Encoding: 8bit". So I
tend to believe OE did what I told it to do, and some gateway along the
way applied the base64 layer.
Second, it is 2008 and mail agents are supposed to be able to deal with
base64-encoded UTF-8 by now.
</ot>
> I'm not going to decode base64 into raw UTF-8, and
> then UTF-8 into a hex. code point. It is far better
> in the subtag registry, where I get the correct NCR,
> and don't need to worry about anybody's fonts.
1. The Registry is moving to UTF-8. This has been decided.
2. You don't ever have to worry about anybody else's fonts, only your
own.
3. The debate over "use the correct spelling" versus "make it legible on
the oldest, most limited system" is at the heart of the section of
Survey #2 that deals with Geʻez and such. I don't believe the
solution is "encode everything twice." Others do. That's why your
4645bis Editor has initiated surveys rather than make these changes
unilaterally, as some have suggested I should do.
>> Do the protocols depend on the exact name, or do
>> they depend on the code element and meaning?
>
> They are based on whatever 639-1/2 said, that is how
> most 1766-3066-4646 language subtags are specified.
Let me try again: Do they depend on the EXACT name, to the extent that
any change in hyphenation or apostrophe usage will cause compatibility
problems? Please provide examples.
> Any "redefinitions" in ISO 639-3 can be incompatible,
> see the "zh" vs. "cmn" part of the zh-Latn debate.
There has been no redefinition of 'zh'/'zho' in any part of 639. 639-3
introduces the concept of macrolanguage and defines 'zho' as "any
language that is sometimes called Chinese," the same meaning it has
under 639-1 and -2, and introduces additional code elements for specific
"Chinese" languages. ietf-languages members don't agree on whether
'pinyin' should refer to 'zh' or 'zh-cmn'/'cmn', but that is not a
matter of 639-3 redefining anything.
> Or the stunt to redefine "fy" some years ago, where
> I fortunately never found any fy-DE or similar cases.
>
> Limiting an existing code (Frisian to Western Frisian,
> Chinese to Mandarin, Yugoslavia to Serbia, etc.) is
> in theory always wrong. In practice it might work...
But the fact is that ISO does change these supposedly sacred names from
time to time. While we are worried about preserving exact hyphens and
apostrophes and about whether "Borna" can be interpreted the same as
"Borna (Ethiopia)", ISO can and does make much larger-scale changes.
>> You mean something like this?
>> Comments: Listed as "Ainu" in ISO 639-2
>
> Yes, I'm not sure how important or helpful the info is.
> The GG-IM-JE comment is a similar idea.
Neither am I. But we can consider adding such comments to any subtags
where the exact ISO name isn't preserved.
> [hyphenation of macedo romanian]
>> there have been participants on both lists who have
>> insisted that the precise ISO 639-2 name AND the
>> precise 639-3 name must be kept intact, down to the
>> last space or hyphen or ʻ
>
> Maybe that was me, maybe it was Debbie. The solution
> to pick one name in both source standards is fine.
Do you mean "one name from each standard," meaning we have to keep
trivially different names, or "one name encompassing both standards,"
meaning we can choose one and discard the other?
>> ISO 15924 lists only Geʻez, not Ge'ez.
>
> Then this ASCII Ge'ez has to be removed in the next
> round of modifications. How did a name get into the
> registry if it is not in the source, did we miss the
> change ?
RFC 4646 doesn't prohibit this list from adding Description fields
beyond those in the source standards. They must not conflict with the
existing description(s). ietf-languages agreed to add the
ASCII-apostrophe version in June 2006 after a lengthy debate.
Actually there was a change in ISO 15924, which originally used ’
and which change sparked the lengthy debate.
--
Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
More information about the Ietf-languages
mailing list