Tables and contextual rule for Katakana middle dot

Eric Brunner-Williams ebw at abenaki.wabanaki.net
Wed Apr 8 00:50:43 CEST 2009


Mark Davis wrote:
> ...
>
> There are other dot-like characters that are far more visually similar 
> to dot, like Arabic zero.
>
> عربي٠عربي.com <http://xn--ngbazb1bc2jd8q.com>
> vs
> عربي.عربي.com <http://xn--ngbrx4e.xn--ngbrx4e.com>
>
> But more importantly, there is a real lack of data presented for these 
> kinds of positions. When excluding characters that are in common use 
> on the basis of visual confusability, such as Katakana middle dot, 
> let's see some real data on what a difference this would make in 
> overall visual confusability of characters. Of all of the visually 
> confusable characters in PVALID, what would be the percentage 
> difference but adding or removing Katakana middle dot? And why do 
> people think this can't be handled by exactly the same mechanisms that 
> programs have to handle the visually confusable characters that *are* 
> PVALID.

I applied the no-punctuation principle when looking at U+166E. However, 
it (a very small baseline aligned "x") really doesn't look like a label 
separator, and there really is no harm in a Cree full stop appearing 
within a Cree character string, creating labels of the form 
"whatever<dot>cree-sentence-1<mini-x>cree-sentence-2<dot>else.

So, I have some second thoughts about DISALLOWED for  U+166E. The case 
for PVALID is ... well ... inventive, not compelling.

For U+166D, a symbol, I'm willing to keep it DISALLOWED, for several 
reasons.

I wrote this because I think you're correctly asking what's really 
confusable, and the question is larger than just Katakana middle dot.

Eric



More information about the Idna-update mailing list