Mapping and Variants

Tue Mar 10 05:52:40 CET 2009

Here's basically what I said:

There are many, many cases of visual confusibles - IPA is not the only or
the worst case. Moreover, many IPA characters *are* used in legitimate
alphabets, especially in non-European languages.

For example, there is a draft character picker on my home site,
http://www.macchiato.com/. Even in the common characters, you will see
confusibles, like

ɓlogspot.com

where the ɓ is http://unicode.org/cldr/utility/character.jsp?a=0253

(That is picking Latin from the left, and Common from the center menus. At
address-bar sizes, this can easily be confused.)

And for that matter, if you go to Latin>IPA, you'll see that ASCII a-z are
also IPA, as well as many others characters from languages that you'd
recognize.

The working group also rejected sifting for historic characters, but if you
go to those you'll find others, like
http://unicode.org/cldr/utility/character.jsp?a=0185

The problem simply cannot be solved in the protocol - there are too many
cases where legitimate and illegitimate labels can't be distinguished, not
without context. And even trying to distinguish them would take years. Note
that the use of NFKC+CaseFolding dramatically reduces the opportunities -
without those, we'd be much worse off. And yet 2 edge cases resulting from
those (eszett & sigma) have absorbed a huge amount of time. *And that is
just for Latin -- there are far trickier issues in many other scripts, or if
multiple scripts are allowed*.

The issue of visual confusion is much, much bigger than can be handled in
the protocol - it really takes involvement by the user agents (browsers,
etc) and registries, because they have far more information available in
terms of context and environment.

That's why we have put together guidance in:

http://www.unicode.org/reports/tr36/

and data in:

http://www.unicode.org/reports/tr39/

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090309/3c010ae0/attachment.htm