<div dir="ltr">I have been following this discussion with some interest and have come away with a thought that some of you may wish to refine or perhaps debate. Basically, I see the UNICODE effort as only partly aligned to the needs of the Internet's Domain name System and the effort to use the UNICODE character parameters/descriptors/properties does not always line up with the desirable properties of the use of characters in the DNS. It seems to me useful to recall that domain names are identifiers that are not expected or even intended to follow purely linguistic constraints. They are used to create what are intended to be unique identifiers. Characters that have a high probability of looking the same but are encoded differently work against that goal. Of course I am fully aware of the confusability of the lower case letter "L" and the digit "ONE" (and "OH" and "ZERO") that is sometimes used as an example of the inconsistent toleration of confusion in the ASCII labels but I consider this to be an argument of the form "you allowed a case of confusion therefore you should tolerate all confusion". <div><br></div><div>I do wonder whether it is worth considering an attempt to create a new set of properties of UNICODED characters that are of specific use to the DNS. The IDNA 2008 work tried to use properties of characters developed for purposes other than the DNS and the fit is not always perfect. </div><div><br></div><div>vint</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jan 23, 2015 at 4:14 AM, "Martin J. Dürst" <span dir="ltr"><<a href="mailto:duerst@it.aoyama.ac.jp" target="_blank">duerst@it.aoyama.ac.jp</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello Asmus,<br>

<br>

On 2015/01/22 11:58, Asmus Freytag wrote:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I would go further, and claim that the notion that "*all homographs are<br>

the**<br>

**same abstract character*" is *misplaced, if not incorrect*.<br>

</blockquote>

<br>

That's fine. Nobody would claim that 8 (U+0038) and ৪ (Bengali 4, U+09EA) are the same abstract character. (How 'homographic' they look will depend on what fonts your mail user agent uses :-)<br>

<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

U+08A1 is not the only character that has a non-decomposable homograph, and<br>

because the encoding of it wasn't an accident, but follows a principle<br>

applied<br>

by the Unicode Technical Committee, it won't, and can't be the last<br>

instance of<br>

a non-decomposable homograph.<br>

<br>

The "failure of U+08A1 to have a (non-identity) decomposition", while it<br>

perhaps<br>

complicates the design of a system of robust mnemonic identifiers (such<br>

as IDNs)<br>

it appears not be be due to a "breakdown" of the encoding process and<br>

also does<br>

not constitute a break of any encoding stability promises  by the Unicode<br>

Consortium.<br>

<br>

Rather, it represents reasoned, and principled judgment of what is or<br>

isn't the<br>

"same abstract character". That judgment has to be made somewhere in the<br>

process, and the bodies responsible for character encoding get to make the<br>

determination.<br>

</blockquote>

<br>

While I can agree with this characterization, many judgements on character encoding are by their very nature borderline, and U+08A1 definitely in many aspects is borderline. What I hope is that the Unicode Technical Committee, when making future, similar decisions, hopefully puts the borderline a bit more in support of applications such as identifiers, and a bit less in favor of splitting. Also, that it realize that when principles lead to more and more homograph encodings, it may very well pay off to reexamine some of these principles before going down a slippery slope.<br>

<br>

Regards,   Martin.<div class="HOEnZb"><div class="h5"><br>

______________________________<u></u>_________________<br>

Idna-update mailing list<br>

<a href="mailto:Idna-update@alvestrand.no" target="_blank">Idna-update@alvestrand.no</a><br>

<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/<u></u>mailman/listinfo/idna-update</a><br>

</div></div></blockquote></div><br></div>