Parsing the issues and finding a middle ground -- another attempt

John C Klensin klensin at jck.com
Fri Feb 27 23:12:56 CET 2009


--On Friday, February 27, 2009 13:08 -0800 Erik van der Poel
<erikv at google.com> wrote:

> Hi Vint,
> 
> Thank you for your patience.
> 
> On Thu, Feb 26, 2009 at 6:32 PM, Vint Cerf <vint at google.com>
> wrote:
>> if we reject Esszet and final sigma as PVALID, then the
>> present situation in which they are mapped means that their
>> use will fail under IDNA2008 - because they only worked as a
>> consequence of mapping under IDNA2003.
> 
> No, they would only fail under IDNA2008 if the pre-processor
> did not map them. (The pre-processor spec is outside the
> current list of IDNA2008 drafts.)

Or if whatever application is doing the lookup does not
implement the hypothetical pre-processors spec.   There is no
way to guarantee that it will be implemented.  So the reality is
that we will have three types of implementations:

	* IDNA2003-conforming (map Eszett to "ss", final sigma
	to lower case sigma, and ZWJ/ZWNJ to nothing)
	
	* IDNA2008-conforming, without preprocessor (reject
	Eszett and Final Sigma, treat ZWJ and ZWNJ as themselves
	with a greater or lesser degree of contextual checking.
	
	* IDNA2008-conforming plus preprocessor (map Eszett and
	final sigma as above (over the objections of the German
	registry, treat ZWJ and ZWNJ as themselves).

That gives us a three-way incompatibility, not just a two-way
one.  Not clear to me that it is an improvement.

>> If we
>> allow them as PVALID and let the registries include both
>> formerly mapped and unmapped forms, at least I think we end
>> up with something that can accommodate both usages except
>> that the occurrence under IDNA2008 would be through direct
>> use of both forms with punycoding of each.
> 
> I believe Vaggelis has been explaining that the .gr registry
> folks are not entirely happy with the DNAME half-solution. If
> we make Final Sigma PVALID (and refrain from mapping it to
> Normal Sigma), the .gr folks will have to add even more DNAMEs.

I'd really like to see a solution to the problems this poses.  I
don't know of one that is at all plausible.  Trying to correct
things back to final sigma on display doesn't work without
hyphens (or some equivalent -- someone could, of course,
introduce ZWNJ into Greek) or prohibits them in the middle of a
label.  Trying to do this with a metadata file doesn't work
unless the file is per-domain and identifies exactly which
characters are to be converted and even that doesn't solve the
non-web question.
 
> Vaggelis has said that a PVALID Final Sigma does have its
> "advantages", and I believe one of them would be the ability to
> display the Final Sigma to users. However, as I have
> explained, you can get the display advantage via
> http://<domain-name>.gr/idndisp.txt without leaving Final
> Sigma unmapped.

Could you be very specific about what you think would be in that
file and how it would work.  I'm having trouble forming a
picture.  If it just says "if you see a sigma at the end,
display final sigma", then it doesn't cover the embedded cases
that occur when words are catenated to form labels.   And I
still want to hear how this would work for other protocols,
including protocols that don't exist yet, and how caching would
be handled.  Does that file ever expire or get updated and, if
so, what are the timeout conditions?  And how would it work for 
   string-in-greek.somedomain.biz.
which has nothing to do with the .GR TLD?

I'd like to see a way to make this work on a "do whatever people
want" basis, but I don't see one short of a DNS redesign --
either along the lines Andrew proposed, going back to some
variation on my long-deal Class proposal, or deciding that this
is one more issue pointing to the desirability of doing a
complete DNS-version-2 design.  But, as Andrew, Jaap, and others
pointed out, any of those solutions would involve a _very_ long
deployment time.

> If the .gr folks decide that IDNA2003 has failed under .gr,
> they may also decide to experiment with the following:
> 
> (1) MSIE plug-in for the URL bar (similar to the old IDNA2003
> plug-ins) (2) Firefox extension or modification for the URL bar
> (3) for the keyboard only, map letters with tonos to letters
> without tonos (4) continue to map final sigma to normal sigma
> (5) after mapping, convert to Punycode, prepend xn--, and
> perform DNS lookup (6) make the MSIE/Firefox additions fetch
> http://<name>.gr/idndisp.txt (7) before display, convert the
> display form to A-labels to make sure they match the originals
> (for security reasons)
> (8) if the local experiments show good results, try to get
> MSIE and Firefox to adopt the .gr mappings in keyboard-related
> code (9) provide mapping tools to the community, for HTML
> authoring, etc (10) encourage HTML authors to use the xn--
> form, so that DNAMEs are unnecessary
 
> One of the dangers of this approach is so-called balkanization
> (or fragmentation) of the Internet, especially if many ccTLDs
> and 2LDs start experimenting with and demanding their own
> mappings.

This is always a risk, to be weighed against many things
including the observation that plug-ins have never worked in a
completely predictable and satisfactory fashion.  I also note
that there are language communities who are significantly
offended by Unicode --and therefore the treatment of almost all
characters-- rather than just final sigma plus the tonos
problem.  They may not be rational, or correct in some abstract
sense, but, if they can capture a registry and perhaps a
regulatory authority, there are all sorts of "opportunities".
If, directly or indirectly, we encourage going down a path such
as what I think you intend by http://<name>.gr/idndisp.txt, then 
those communities might well use the mechanism to express their
own ideas about how things should be rendered, perhaps even
specifying character images rather than mapping rules or the
equivalent.

I also note that, unless the cross-checks get very sophisticated
(and therefore time-consuming) that a display format file is the
phisher's dream because it could be used to associate
"your-favorite-bank" on display with "evil.com" in the DNS.

Finally, ignoring the non-web protocols for a moment, it is not
clear to me what would be accomplished by having a separate file
that would not be equally well accomplished by clever
construction of an "<a>" element in HTML (or its equivalent in
other arrangements).  That would bind display to the content
specification, which is perhaps where it belongs, rather than
the domain, and would avoid inventing _any_ new mechanisms.

> However, the xn-- labels will continue to work in other parts
> of the world, so there's no real fragmentation there, other
> than the relatively minor display issue (since tonos-less
> letters look similar to the same letters with tonos).

Of course, that is an argument for generalizing a bit and
carrying only A-labels in URLs (see subsequent note) and using
supplemental information for display.

> Internationalization and localization often start out as local
> programs or modifications that eventually get adopted by
> software in other parts of the world. For example, local
> engineers shoe-horned bidi support into several programs, and
> eventually e.g. MSIE and Firefox built their own bidi support.
 
> It is important to refrain from performing the .gr mappings to
> domain names found in hrefs in HTML. Otherwise, locally
> authored HTML documents will not work in other parts of the
> world (unless there are DNAMEs for those domain names, which
> would defeat the goal of eventually getting rid of DNAMEs).

If I correctly understand it, it is precisely one of the
problems I'm concerned about.  You know why that is a bad idea,
but a page author trying to do something interesting (even with
good intentions) may want to use the mechanism to do something
else entirely.  

>...

    john



More information about the Idna-update mailing list