<HTML>

<HEAD>

<TITLE>Re: "This case isn't the important one" (was Re: Visually confusable characters (8))</TITLE>

</HEAD>

<BODY>

<FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>Not sure I should get into this hot air... But...<BR>

<BR>

Well, concentrating on just "with hamza above", and trivially just "grepping" for that (and skipping<BR>

those in the Fxxx range), we find:<BR>

<BR>

1) those with canonical decomposition<BR>

0623;ARABIC LETTER ALEF WITH HAMZA ABOVE;Lo;0;AL;0627 0654;;;;N;ARABIC LETTER HAMZAH ON ALEF;;;;<BR>

0624;ARABIC LETTER WAW WITH HAMZA ABOVE;Lo;0;AL;0648 0654;;;;N;ARABIC LETTER HAMZAH ON WAW;;;;<BR>

0626;ARABIC LETTER YEH WITH HAMZA ABOVE;Lo;0;AL;064A 0654;;;;N;ARABIC LETTER HAMZAH ON YA;;;;<BR>

06C2;ARABIC LETTER HEH GOAL WITH HAMZA ABOVE;Lo;0;AL;06C1 0654;;;;N;ARABIC LETTER HAMZAH ON HA GOAL;;;;<BR>

06D3;ARABIC LETTER YEH BARREE WITH HAMZA ABOVE;Lo;0;AL;06D2 0654;;;;N;ARABIC LETTER HAMZAH ON YA BARREE;;;;<BR>

<BR>

2) those with compatibility decomposition<BR>

0677;ARABIC LETTER U WITH HAMZA ABOVE;Lo;0;AL;<compat> 06C7 0674;;;;N;ARABIC LETTER HIGH HAMZAH WAW WITH DAMMAH;;;;<BR>

<BR>

3) those without decomposition<BR>

0681;ARABIC LETTER HAH WITH HAMZA ABOVE;Lo;0;AL;;;;;N;ARABIC LETTER HAMZAH ON HAA;;;;<BR>

076C;ARABIC LETTER REH WITH HAMZA ABOVE;Lo;0;AL;;;;;N;;;;;<BR>

08A1;ARABIC LETTER BEH WITH HAMZA ABOVE;Lo;0;AL;;;;;N;;;;;<BR>

<BR>

Naïvely, this looks a bit like hit and miss, and I don't know the reasons behind this (see Ken's messages for a<BR>

partial explanation).<BR>

<BR>

However, singling out one of these to "DISALLOW" in IDNA2008 (or is it IDNA2010) seems to be even more<BR>

of miss.<BR>

<BR>

As Roozbeh, Asmus, Mark and Ken W. have pointed out, handing one (possible) "confusables" case (which<BR>

are not compatibility equivalent) in a very different manner from other "confusables" (that are not<BR>

compatibility equivalent) seems to be a very bad idea, for various reasons.<BR>

<BR>

Now, should BEH WITH HAMZA ABOVE been encoded? Maybe not, but that is irrelevant now. Should<BR>

REH WITH HAMZA ABOVE and HAH WITH HAMZA ABOVE been given canonical (or compatibility) decompositions<BR>

(or not been encoded)? Maybe. Or maybe there are valid reason for having things as they are, just that<BR>

the names are too confusing... I'm not sure it is worthwhile diving deep into the history of just these<BR>

few characters (though Ken is doing that, and is most welcome to) to find out (unless you are deeply<BR>

interested in the Arabic script, of course); there are many other cases of non-equivalent confusables.<BR>

<BR>

And as Ken and Roozbeh pointed out with examples (and Mark without giving examples in the emails),<BR>

there are many other cases that are less "obvious" (from reading the names only) of very similar but<BR>

not (compatibility) equivalent letters. Are you planning on DISALLOWing them too? Big can of worms...<BR>

Not that the cases should not be dealt with, of course they should. See <a href="http://www.unicode.org/reports/tr39/">http://www.unicode.org/reports/tr39/</a>.<BR>

<BR>

----------<BR>

<BR>

On a slightly different point, in the Latin script: Andrew wrote<BR>

"in the case of (e.g.) ö (in Swedish) and o-umlaut (in German).<BR>

They're clearly different letters linguistically too."<BR>

<BR>

How? They are pronounced the "same" in Swedish and German (except for differences<BR>

only a dialects expert/linguist might notice). IFAIK, they also have the same history;<BR>

"oe" tuning into œ, tuning into oͤ (o with e above), tuning into either ö or ø. Maybe you<BR>

intended to contrast with French or Dutch, where "two dots above" is used for something<BR>

else (as a mark for separate pronunciation as opposed to diftong, French using œ for<BR>

what is written ö in Swedish and German). Despite THAT difference, I would still say<BR>

that the "two" ö (French/Dutch/... vs. Swedish/German/...) are still the same *character*,<BR>

just different orthographic uses. But Swedish and German do collate ö differently...<BR>

<BR>

And at some level, œ, oͤ, ö, ø, (and even o with ogonek) are the same letter (for Danish/Norwegian/<BR>

Swedish/German/...), even though they do not look all that much alike.<BR>

<BR>

Nit: "Faeltroem" is a major typo in German as well, even though that *fallback* seems to be<BR>

more common in German than for Swedish (where it has been used, huh, back in "pure ASCII"<BR>

times, or when some people use a keyboard without the "local" letters).<BR>

<BR>

/Kent K<BR>

<BR>

</SPAN></FONT>

</BODY>

</HTML>