I-D Action: draft-klensin-idna-rfc5891bis-00.txt
Shawn.Steele at microsoft.com
Sat Mar 11 22:52:06 CET 2017
I'm not at all sure where language has anything to do with it. And language independence is critical as the people trying to use the IDN may not be native speakers of the IDN's language.
The fundamental "problem" is that some things seem like other things to some people some of the time.
The size of the set of codepoints makes that inevitable.
There's been tons of discussion about strange quirks of seriously esoteric characters and how those can lead to identifiers that seem disparate, but aren't (or vice versa.) Kinda like ratholing on naive vs naïve.
Yet we completely ignore other common problems, like Mueller vs Müller - which are admittedly language independent.
In my opinion, part of the problem is goals: Human usability vs machine uniqueness. For machine uniqueness I'd think any set of rules would suffice because it boils down to a bunch of numbers. For human usability you end up with confusables being an issue. Those cannot be perfectly resolved because there are lots of minor pixel variations that are perfectly valid yet different.
Another human usability desire is to have the "right" display form, so that my businesses advertising "looks right" and yet customers can still find my business online. IMO, that's the most interesting part of mapping, allowing pretty human forms to turn into machine-readable and somewhat consistent forms. An unfortunate problem of the mapping is that the mapped form may not be as pretty.
There's perhaps another perceived unspoken requirement for uniqueness while still round-tripping through humans. IMO, that's an unachievable expectation; there's no way a human can transcribe all of the reasonable names perfectly every time. We can't even get O and 0 right, so dot below vs comma below or other more subtle issues are hopeless when written on a napkin.
I'd prefer that the IETF standard be very lax with respect to permissible characters, and, coming back to your document, encourage registrars to do the right thing for their customers with respect to permissible &/or mapped characters.
From: John C Klensin [mailto:klensin at jck.com]
Sent: Saturday, March 11, 2017 1:27 PM
To: Shawn Steele <Shawn.Steele at microsoft.com>; idna-update at alvestrand.no
Subject: RE: I-D Action: draft-klensin-idna-rfc5891bis-00.txt
--On Saturday, March 11, 2017 19:47 +0000 Shawn Steele <Shawn.Steele at microsoft.com> wrote:
> It makes sense to reinforce that registrars need to do their own
> narrowing of code points according to their needs.
That was the position that motivated the document.
> WRT the other issues that are avoided here, IMO the IETF should defer
> to Unicode as they are the ones that add new codepoints and they fully
> understand the security and other issues in the space. Encoding
> characters is, after all, their expertise.
No one has questioned their ability to encode characters. The issues are things that have been issues since IDN work was
initiated (at least since the decision to use Unicode). I
think Unicode is a great system for encoding running text and an even better one for encoding text that is to be rendered and printed or otherwise displayed. There is no dispute about that.
However, for identifiers and identifier matching, there are differences in philosophy, several of which have been illustrated by issues that have shown up in the last few years.
As one example, the DNS, at least the way IDNs and IDNA were conceived, does not have any "language" context, so Unicode distinctions among code points or ways of composing characters that are based strictly on language distinctions don't work well within IDNA (we could have designed IDNA to incorporate language information but we didn't for what seemed like good ideas at the time --and still do to many of us-- but, if anyone has the stomach to reopen that design question and start planning a major incompatible change were the decision to go the other way,
go for it).
More information about the Idna-update