Eszett (was Implementation questions)

Erik van der Poel erikv at google.com
Fri Dec 26 21:01:58 CET 2008


On Fri, Dec 26, 2008 at 8:13 AM, JFC Morfin <jefsey at jefsey.com> wrote:
> At 21:32 24/12/2008, John C Klensin wrote:
>>First of all, my comment about what I would do as a potential
>>registrant is independent of the advice I would give registries.
>>I have always advised registries to avoid mixed-script labels
>>unless special circumstances arise and to use variant techniques
>>to restrict registrations or separate ownership of
>>easily-confused labels.   Sometimes registries take that type of
>>advice and sometimes they do not and I recognize that there are
>>legitimate reasons for not doing so.
>
> Users are not engineers. They trust the IETF to make the Internet
> work better. If an IETF protocol permits something, why would they as
> simple zone managers forbide it?  If IETF protocols are not fool
> proof what can they do about it?

Hello JFC, many (if not most) of the participants in this WG long ago
realized that the IETF cannot, by itself, make the system fool-proof.
We have accepted that registries and applications will help address
the problem. If you try to re-open this issue, then you have become
one of the individuals that delay this WG. :-)

> 2) at the DNS. An alternative I proposed and several consider in
> their own ways. We wait for of IDNA2008 to be completed to give it a
> try and preserve interoperability and interintelligibility.
>
>>From that perspective, whatever leverage the UDRP, local rules
>>and regulations, etc., provide is very much part of the system.
>>Many decisions about what to register are ultimately up to the
>>registry. If the registry perceives risk if they get it wrong,
>>that is not necessarily a bad thing.
>
> The registry eveluations will be diverse. If the protocol does not
> unify them this will be a technical balkanization unless the internet
> becomes a new Compuserve-like network, lead by a Search Engine
> Operators's consortium.

Please re-read your previous paragraph ("An alternative I proposed").
You may be contributing to balkanization. Different registries will
have different rules, partly because some characters in one language
look very similar to other characters in other languages. These rules
will not lead to "balkanization". These rules will reflect natural
divisions between languages and cultures that already exist today.
Your newspaper is written in one or a few languages, right? That is
because it is aimed at those groups of people. You could say similar
things about Web sites and domain names.

>> > You may remember that during the joint ITU/UNESCO meeting in
>> > Geneva I questionned the WIPO on such situations (as well as
>> > on babel-names [protected ASCII labels]). The response (after
>> > a few cofees) was that they respected IPR in ASCII and in
>> > Unicode, but were unable to decide when there was conflict
>> > between the two.
>>
>>There is no more reason to believe that there is a conflict
>>between ASCII and Unicode.  There are conflicts between
>>(non-ASCII) scripts within Unicode as well.   To take a handy
>>European example, there is more overlap between Greek and
>>Cyrillic than between either and ASCII.
>
> You play on words. The conflicts are over the non semiotic
> equivalence of punycode in/outputs. ASCII based, Unicode and real
> sortable universal non confusable character set namespaces are not
> equipotent. Outside of multiered externets, IDNA is not strict enough
> to artificially enforce that equipotence. Even if the work the WG
> carries may reduce the cardinal differences, there is no
> architectural enforcement of the IDNA constraints.  This is very
> simple mathematics. And Internet architecture: IDNA is not end to end
> - except when between a Search engine and its own made browser.

John may or may not have dodged your question. He may have played with
words. But I suggest that you are using poorly-understood terminology
(e.g. semiotic). When you participate in a WG, it helps if you use the
same terminology that others are already using. I suspect that you are
referring to tricky domain names like xn--intel or xn--cocacola. How
many people in this WG call them babel-names? Are you the only one?

A more constructive way to participate in this WG is to suggest
solutions to this problem. Many browsers seem to have settled on the
display of Punycode when the Unicode version is somehow determined to
be dangerous (or whatever). I have suggested special UIs that obscure
the Punycode string but still allow the user to click or
copy-and-paste. My suggestion was met with silence (and that is not
unusual). Maybe you have a better suggestion?

> Whatever you want. As long as it is in the code, i.e. documented in
> the RFC being used. Either published by this WG or by another WG
> building-up on this WG IDNA2008, if possible before 20090901-00:00.
> This would help ICANN and the Internet stability.

What is special about 20090901-00:00? What is going to happen at that time?

> There should always be equal coopetition/complementarity
> between DNS and Search Engines. My point is that from too long (hence
> the delay) this WG's discusses point of interest only to Search
> Engines. And delays the DNS calendar.

I disagree. We have not been discussing things that are only
interesting to search engine developers. We have been talking about
HTML a lot, but that is because many HTML implementations have
(mistakenly or correctly) adopted IDNA2003 pre-processing rules for
strings that are sent over the wire (HTML hrefs). This WG is part of
the same IETF that published IDNA2003, so this WG must address
migration issues that arise from (mis)interpretation of previous IETF
specifications. We have not discussed other protocols (e.g. email) as
much because those implementations do not appear to have the same
migration issues.

I have repeatedly stated that only a small number of HTML links rely
on IDNA2003 pre-processing. Yet, no browser developer appears to be
willing to drop the mappings (lower-casing, NFKC and "map to
nothing").

The somewhat larger number of HTML links that rely on IDNA2003 mapping
and Punycoding are clearly written by people who don't care whether
those links work in major browsers, because MSIE6 does not work with
them. If these authors do not care, why should the browser developers
care? Why not restrict hrefs to Punycode labels, so that eszett will
work in IDNA2008 A-labels (after IE7 is upgraded) and all Punycode
labels will work in IE6?

Or some compromise, like always mapping eszett to ss, thereby forcing
eszett registrands to use IDNA2008 A-labels.

Of course, I realize that these suggestions may not fly, because, as I
said, browser developers tend to be (too) lenient.

Erik


More information about the Idna-update mailing list