Unicode 7.0.0, (combining) Hamza Above, and normalization

Vint Cerf vint at google.com
Fri Aug 8 00:16:09 CEST 2014


Ken,

your argument seems to be based on linguistic grounds and this I can accept
but I continue to be concerned about the consequences of the use of these
two character-forming mechanisms in the domain name system. As i think we
all know, domain names are NOT language. At best, they are collections of
characters drawn from ASCII or from Unicode. The consequence of this
observation is that we have an obligation to think about the ways in which
such characters may be used *in domain names* and the hazards that might be
encountered or avoided, depending on the rules we pick to deal with this
case (among others).

Matching rules, phishing opportunities and other phenomena lie at the
center of the application of the richness of Unicode to the identifier
space of domain names. Surely we want to observe the Principle of Least
Astonishment in these situations?

v



On Thu, Aug 7, 2014 at 5:47 PM, Whistler, Ken <ken.whistler at sap.com> wrote:

> Paul Hoffmann asserted:
>
> > Right. To me, the current processing under NFC is the wrong result.
> Andrew
> > was a bit polite at the end of his message, but it sounds to me that he
> thinks
> > the NFC processing for the new character leads to the wrong result when
> > compared to earlier NFC processing.
>
> The issue for the table update comes down to that.
>
> I think it is quite clear, however, that it is not the case that "the
> current processing
> under NFC is the wrong result".
>
> The premises of this argument all come down to implicit (or
> occasionally explicit) assertions that the beh-with-hamza encoded
> for the Fula implosive b is the *same* character as an existing
> Arabic beh character followed by the combining Hamza mark.
>
> They are *NOT* the same. And *if* they are not the same, all the
> arguments about NFC being wrong, etc., are pointless.
>
> These implicit assertions that the beh-with Hamza and the sequence
> *ARE* the same are as beside the point as heading down the road
> of citing any number of other possible once similarities in appearance:
> for example, claiming that U+063A ARABIC LETTER GHAIN is the *SAME*
> character as U+0639 ARABIC LETTER AIN + U+0307 COMBINING DOT
> ABOVE sequence, because the atomic character and that sequence
> might look the same.
>
> --Ken
>
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20140807/c55a8f83/attachment.html>


More information about the Idna-update mailing list