IDNA online tool

Fri Apr 10 13:01:47 CEST 2009

Hello Mark,

You seem to be right; I forwarded a bug report to Opera.

On 2009/04/10 12:48, Mark Davis wrote:
> The M-Label string is \uFECB\uFEAE\uFE91\uFEF2
> The U-Label string is \u0639\u0631\u0628\u064A

I guess U+ notation would be better; nobody needs to know you used Java :-).

> The M-Label does map to the U-Label (in IDNA, and in IDNAbis unless we
> restrict the mapping to exclude these characters).

I and others have proposed that we should restrict the mapping for 
these, unless we get feedback from the Arabic script community otherwise.

> Every conformant browser, where fonts are available, should give essentially
> the same rendering for both strings (which is why they are mapped together
> by NFKC).

Well, no. On the one hand, NFKC equivalences often clearly render 
differently (think about circled numbers,...), and on the other hand, 
compatibility Arabic only renders the same as the normalized version if 
the connectivity variants are carefully selected to follow the 
connectivity rules of the Arabic script. As an example, the NFKC form of
ﺶﺷ is شش, which doesn't look the same.

Regards,   Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst at it.aoyama.ac.jp