<html>

<body>

<blockquote type=cite class=cite cite=""><b>At 22:29 07/08/2014, John C

Klensin wrote:</b><br>

And I suspect this list has now had enough of this

discussion.</blockquote><br>

<blockquote type=cite class=cite cite=""><b>At 19:37 07/08/2014, John C

Klensin wrote:</b><br>

Personally, I don't believe an objective standard and categorization is

possible unless one constrains the problem to the point of making it

uninteresting (e.g., by believing that the world can, in practice, be

forced into a single universal type style of type

family).</blockquote><br>

Dear John,<br>

What is uninteresting and cumbersome and I certain have enough discussing

are exceptions. Whatever they may be. Unicode is a typographic encoding

we chose to use for standard end to end transmissions and related

services (with some restrictions). Either it comes with a built-in common

protocol to support what we consider as exceptions or we do not accept

them.<br><br>

<blockquote type=cite class=cite cite=""><b>At 01:07 08/08/2014, Andrew

Sullivan wrote:</b><br>

Sorry for the iPhoney reply.  I wasn't trying to say the result is

wrong as such.  I think it _may_ be wrong for IDNA and therefore

possibly an indication that our approach in IDNA2008 (and therefore alas

in precis) is inadequate. </blockquote><br>

Dear Andrew,<br>

certainly it is. But this is the best compromise we found due to the way

Unicode is.<br><br>

<blockquote type=cite class=cite cite=""><b>At 00:56 08/08/2014,

Whistler, Ken wrote:</b><br>

All of this discussion seems to be boiling down to IETF second-guessing

of Unicode character encoding decisions and complaints about Unicode

normalization not satisfying expectations based on rather simplistic

notions of which things that look the same should *be* the

same.</blockquote><br>

Dear Ken,<br>

"things that look the same should *be* the same" is very

confusing as you do not specify to who they look the same. Unicode deals

with people, we deal with interconnected people and the systems that

interconnect them. This makes three different strata, in which there are

different cognition/processing layers.<br><br>

<blockquote type=cite class=cite cite=""><b>At 01:23 08/08/2014, John C

Klensin wrote:</b><br>

But I'm pretty sure that assertions that this is a different character

despite the same name as the combining sequence and, as far aw we can

tell, an identical appearance, do not help move us forward.

</blockquote><br>

I appreciate that you say in details what I say in principle in that

mail. Your conlusion confirms that we are in the same case as the French

majuscules (once they have been minored by IDNA2008 - a metadata that

IDNA2008 loses). As long as Unicode does not support a metadata protocol

for exception encoding ... <br><br>

<blockquote type=cite class=cite cite=""><b>At 02:27 08/08/2014, Andrew

Sullivan wrote:</b><br>

I don't think that's a fair characterization.  Nobody is

"second-guessing" anything.  It's rather that we -- John,

actually -- discovered that there's a consequence of this case that we

did not previously understand, and it has uncomfortable consequences for

the way we had previously relied on Unicode, because it didn't work the

way we thought.  </blockquote><br>

Dear Andrew,<br>

May be time to reconsider the idea of an IETF Unicode including our

exception management through an additional protocol rather than only by

Patrik's tables? <br><br>

<blockquote type=cite class=cite cite="">Presumably, implementers have a

greater reason to become familiar with the picky exceptional

cases.</blockquote><br>

This is not possible unless if pecky exceptional cases (that include

French majuscules :-) and many others) are supported by a standardized

protocol. The question is simply to know where is this protocol to be

located (at Unicode, at RFC 5895, at Layer six).<br><br>

<blockquote type=cite class=cite cite="">This has, note, not just

implications for IDNA2008.  We have a whole working group (PRECIS)

that is busy attempting to use the same strategy in a generalized way for

other protocols.  It hasn't shipped yet, but it's gone to the

IESG.  So we can't just shrug our shoulders.</blockquote><br>

+1<br><br>

<blockquote type=cite class=cite cite="">> There are likely many

similar-looking things that fit in a similar > bucket and have escaped

notice.<br>

All the more reason to concern ourselves with it, no?</blockquote><br>

+1<br><br>

<blockquote type=cite class=cite cite=""><b>At 14:10 08/08/2014, John C

Klensin wrote:<br>

</b>Stated simplistically, that understanding has been that normalization

would deal effectively with the issue of equality comparisons between

"characters" within the same script that had the same

appearance.  </blockquote><br>

Your further comment, as well as my suggestion is that we may have to

refine what what is then "comparison" and "same

appearance" (visually is not precise enough).<br><br>

<blockquote type=cite class=cite cite="">If we now find that intra-script

normalization is insufficient to give us a consistent identity comparison

among the different ways a character (shape) could be formed within the

same script, then it seems to me that it is not inclusion that is at risk

but simply that assumption of normalization sufficiency.

</blockquote><br>

Bingo!<br><br>

<blockquote type=cite class=cite cite="">While I gather that the idea of

a specialized normalization form would remind some people of very early

discussions (even disagreements) within the Unicode Consortium process,

we might have to contemplate an IETF-specific, or IDN-specific,

normalization form, one built on the strict visual form model that we

understood rather than incorporating per-character language, linguistic,

or phonetic or other usage considerations for some cases.  <br><br>

A decision to move in the direction of a different, non-Unicode-standard,

normalization form would probably take us down the path toward

character-by-character evaluations that<br>

many of us have dreaded (again, since early in the pre-IDNA2003

discussions). </blockquote><br>

This would be a too black and white decision. I prefer to add a

discrimination algorithm to IDNA than to quit Unicode.<br><br>

<blockquote type=cite class=cite cite="">But that brings us back to your

observation about recalibrating risk understanding and deciding whether

the risk --or the mechanisms needed to mitigate it -- are worth the<br>

effort and reward.   But I've seen no evidence, or even strong

hints, that the issues this case have turned up brings the inclusion

model, or even the existing IDNA2008 rule and category sets, into doubt,

only the reliance on NFC to do a job that it appears that, for some

cases, it doesn't actually do and wasn't intended to

do.</blockquote><br>

This calls for a market study (cf. RFC 6852): what is the market demand

for 10 years old

<a href="http://unisign.org/" eudora="autourl">http://unisign.org</a> -

those interested are welcome.<br><br>

<blockquote type=cite class=cite cite=""><b>At 14:49 08/08/2014, Vint

Cerf wrote:</b><br>

John,<br>

I think this is an important insight and it may indeed be the case that

normalization for Domain Name purposes and normalization for other

purposes are not as aligned as we supposed. Most users of the Unicoded

scripts are unaware of most or any of the various mechanisms associated

with Unicode and will likely be guided more by the principle of least

astonishment than anything else. I wonder whether a domain-name-specific

normalization would improve the likelihood of achieving the aim of that

principle? </blockquote><br>

Dear Vint,<br>

the issue is the principle of least astonishment for the human reader and

non confusability by the computer reader. The response is "sign +

equivalence table" based on linguistic use tags. So far the best

system I found (cf. my initial exchange with John) is a raster geometric

grid based upon a legally accepted uniform script. The question is not

the fount diversity but the computer memorized sign/symbold code. This

standardization should be made compatible (for many uses) with ISO

various TCs. This Wikipedia page can be used as a start point:

<a href="http://en.wikipedia.org/wiki/List_of_symbols" eudora="autourl">

http://en.wikipedia.org/wiki/List_of_symbols<br><br>

</a><blockquote type=cite class=cite cite=""><b>At 18:01 08/08/2014, John

Levine wrote:</b><br>

If I may stick my semi-informed oar in, it seems to me that for

linguistic purposes, homographs are generally not an issue. 

Remember all those manual typewriters that didn't have digit 1 or 0 keys,

so you used letters l and O instead.<br><br>

In our case, homographs are a big deal.  So can we just say that,

and decide to do whatever minimizes homograph issues even though it's not

the same as what would reflect linguistic usage?</blockquote><br>

IMHO a request for a common effort should go to various trade and

government SDO in order to secure printed/computerized systems (banks,

police, passports, etc.) and should be associated to RFID oriented work.

<br><br>

jfc </body>

</html>