Final Sigma (was: RE: Esszett, Final Sigma, ZWJ and ZWNJ)

Sat Feb 28 00:05:21 CET 2009

Mark,

I agree completely.   And this is why my old "new Class"
proposal was abandoned, quite independent of the issues about
implementation and deployment speed that caused the IDNA plan to
be adopted.

However, there is one difference if one went to a server-side
matching model (independent of whether the input to that model
was Punycode, UTF-8, or something else).  If the comparison and
equality/ equivalence check is done on the server, then we could
go back to separating the question of "what is represented and
encoded" from that of "what matches", just as the ASCII DNS
model does with case-matching.  

>From that point of view, I don't know where Eszett or the French
discussions would fall out, but it is clear to me that the
preferred solution to Final Sigma would be to keep it in the
stored domain names (facilitating the desired display) but be
sure that the matching procedure treated upper-case sigma,
lower-case sigma, and final sigma as equivalent for DNS purposes.

Doing that matching operation on the server would, however,
require modifications to the DNS at least as significant as
those that Andrew described and, like them, could not
realistically be expected to be deployed in less that a decade
and perhaps much longer.   And, as you suggest, we would then
have to wrestle with exactly the same issues about what should
be considered equal (matching) and what would not -- doing
server-side matching would merely parse out the display issues.

    john

--On Friday, February 27, 2009 13:03 -0800 Mark Davis
<mark at macchiato.com> wrote:

> Logically speaking, there is no difference between a UTF-8
> model and a Punycode model. Both can be thought of as
> transfer-encodings of Unicode. Punycode is arcane, but it is
> simply and fully a lossless encoding of Unicode.
> 
> The problems we are dealing with are not because of that, but
> because of the fact that we are mapping and restricting on the
> client side. That is, the encoding is orthogonal to the
> mapping/restricting. If we had UTF-8 in URLs we would be faced
> with the same issue: how do we we map/restrict labels, and
> where is it done. For example, the Greeks really want *more*
> mapping than is done by IDNA2003, not less.
> 
> The one difference, which is orthogonal to the encoding issue,
> is that the core DNS server-side lookup could require
> mapping/restricting, instead of burdening the clients and/or
> registries with it. But the mapping/restricting issues don't
> magically disappear... And it is unclear that every DNS server
> should be required to implement all of the mapping/restricting
> rules. And we still wouldn't want that to differ by language
> (for reasons outlined earlier).
> 
> Don't get me wrong; it would be great to have UTF-8 in the DNS
> at some point. It would make a number of things easier. But it
> would have no affect on the issues that are causing so many
> disagreements (and/or head-scratching).