Unicode 7.0.0, (combining) Hamza Above, and normalization

Shawn Steele Shawn.Steele at microsoft.com
Fri Aug 8 00:30:08 CEST 2014


> As i think we all know, domain names are NOT language.

Philosophically that’s a noble idea, however in practice people encode linguistic words.  They tend to pick names like PetMountain and theanimalstore, not qdcvj and zmopk.  Sure there’s a limit to the linguistic behavior but if we merely wanted an ID we could’ve stuck with 12.34.56.78

> Matching rules, phishing opportunities and other phenomena lie at the center of the application of the richness of Unicode to the identifier space of domain names. Surely we want to observe the Principle of Least Astonishment in these situations?

This seems to argue against the first point.  If we wanted globally unique identifiers, then we shouldn’t have used linguistic compatible monikers and there would’ve been no reason to create IDN.  I, personally, am least astonished when stuff I type isn’t rejected because it was typed using some character that the system arbitrarily rejects.  Especially when that character works in other applications.

I don’t see how matching rules apply.  IDN/DNS is pretty clear about case mapping, which is about the only form of matching that happens, and regardless of how they appear, the code points in question will result in unique code point sequences.

Phishing, if you mean homographs, are a much bigger problem than any single code point, and I don’t see how it would be possible to encode anything to be worse than the existing possibilities.  Certainly this one character isn’t going to add much risk.  There’s also guidance to registrars to avoid registering homographs of other registrations.

Phishing by other means is a completely different problem that is reasonably irrelevant to this discussion.  90% of the humans on this planet can easily be tricked into clicking on a link without resorting to homographs.  I’ve received numerous spam mails trying to phish, and cannot recall a single one that resorted to homographs.  Some of those were “good enough” that I questioned their legitimacy.  And some legitimate URLs I’ve encountered have been far more suspicious than some of the dangerous ones.

-Shawn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20140807/1d62ac95/attachment-0001.html>


More information about the Idna-update mailing list