Additonal prefixes (was: Re: Final Sigma (was: RE: Esszett, Final Sigma, ZWJ and ZWNJ))

John C Klensin klensin at jck.com
Thu Feb 26 22:49:57 CET 2009



--On Thursday, February 26, 2009 21:51 +0100 JFC Morfin
<jefsey at jefsey.com> wrote:

>...
> Other algorithms, derived or not from punycode, should only be
> used to address the exceptions such as those discussed for
> French and Greek.
>...

The problem is that there are exceptions for the use of scripts
and characters for many languages (I'm tempted to say "most"
but, while my experience suggests that, my knowledge isn't quite
wide enough to do that).  Examples:

(1) Should "ö" (lower-case o with diaeresis) match either or
both of "o" or "-o"?  For the latter, note "coöperate" in
English.

(2) Should it match "oe"? (maybe for German, no for Swedish)

(3) Should Arabic or Hebrew characters with vowel-markings match
the characters without them? (probably yes if those markings are
permitted at all, but two characters with different markings
don't match, and the answer for Yiddish is almost certainly
different from the answer for Hebrew)

(4) Should Devanagari half-characters match their full character
forms? (Usually probably not, but note that IDNA2003 causes them
to compare equal as a side-effect of those decisions).

(5) Should a Japanese string in Kanji match its phonetic
spelling in Kana? (the usual computer science answer is
somewhere near "you must be kidding"; the usual end-user is
something close to "of course, every ten-year-old knows that")

And, of course, if one is going to carry things that far,
shouldn't "color" and "colour" match?  After all, they represent
the same concept and the spelling difference is no greater than
some of the examples above (and the ones you have used in your
notes).

     john



More information about the Idna-update mailing list