Here's basically what I said:<br><br><div style="margin-left: 40px;">There are many, many cases of visual confusibles - IPA is not the only or the worst case. Moreover, many IPA characters <i>are</i> used in legitimate alphabets, especially in non-European languages.<br>
<br>For example, there is a draft character picker on my home site, <a href="http://www.macchiato.com/" target="_blank">http://www.macchiato.com/</a>. Even in the common characters, you will see confusibles, like <br><br>
<span style="font-family: tahoma,sans-serif;">ɓ<a href="http://logspot.com">logspot.com</a></span><br><br>where the <span style="font-family: tahoma,sans-serif;">ɓ</span> is <a href="http://unicode.org/cldr/utility/character.jsp?a=0253" target="_blank">http://unicode.org/cldr/utility/character.jsp?a=0253</a><br>
<br>(That is picking Latin from the left, and Common from the center menus. At address-bar sizes, this can easily be confused.)<br><br>And for that matter, if you go to Latin>IPA, you'll see that
ASCII a-z are also IPA, as well as many others characters from
languages that you'd recognize.<br><br>The working group also rejected sifting for historic characters, but if you go to those you'll find others, like <a href="http://unicode.org/cldr/utility/character.jsp?a=0185" target="_blank">http://unicode.org/cldr/utility/character.jsp?a=0185</a><br>
<br>The problem simply cannot be solved in the protocol - there are too
many cases where legitimate and illegitimate labels can't be
distinguished, not without context. And even trying to distinguish them
would take years. Note that the use of NFKC+CaseFolding dramatically
reduces the
opportunities - without those, we'd be much worse off. And yet 2 edge
cases resulting from those (eszett & sigma) have absorbed a huge
amount of time. <i>And that is just for Latin -- there are far trickier issues in many other scripts, or if multiple scripts are allowed</i>.<br><br>The issue of visual confusion is much, much bigger than can be
handled in the protocol - it really takes involvement by the user
agents (browsers, etc) and registries, because they have far more
information available in terms of context and environment.<br><br>That's why we have put together guidance in:<br><br><a href="http://www.unicode.org/reports/tr36/" target="_blank">http://www.unicode.org/reports/tr36/</a><br>
<br>and data in:<br><br><a href="http://www.unicode.org/reports/tr39/" target="_blank">http://www.unicode.org/reports/tr39/</a><br></div>
<br>Mark<br>