Parsing the issues and finding a middle ground -- another attempt

Erik van der Poel erikv at google.com
Sat Feb 28 02:20:49 CET 2009


On Fri, Feb 27, 2009 at 2:12 PM, John C Klensin <klensin at jck.com> wrote:
> --On Friday, February 27, 2009 13:08 -0800 Erik van der Poel
> <erikv at google.com> wrote:
>> On Thu, Feb 26, 2009 at 6:32 PM, Vint Cerf <vint at google.com>
>> wrote:
>>> if we reject Esszet and final sigma as PVALID, then the
>>> present situation in which they are mapped means that their
>>> use will fail under IDNA2008 - because they only worked as a
>>> consequence of mapping under IDNA2003.
>>
>> No, they would only fail under IDNA2008 if the pre-processor
>> did not map them. (The pre-processor spec is outside the
>> current list of IDNA2008 drafts.)
>
> Or if whatever application is doing the lookup does not
> implement the hypothetical pre-processors spec.   There is no
> way to guarantee that it will be implemented.  So the reality is
> that we will have three types of implementations:
>
>        * IDNA2003-conforming (map Eszett to "ss", final sigma
>        to lower case sigma, and ZWJ/ZWNJ to nothing)
>
>        * IDNA2008-conforming, without preprocessor (reject
>        Eszett and Final Sigma, treat ZWJ and ZWNJ as themselves
>        with a greater or lesser degree of contextual checking.
>
>        * IDNA2008-conforming plus preprocessor (map Eszett and
>        final sigma as above (over the objections of the German
>        registry, treat ZWJ and ZWNJ as themselves).
>
> That gives us a three-way incompatibility, not just a two-way
> one.  Not clear to me that it is an improvement.

My email mentioned Eszett, Final Sigma, ZWJ and ZWNJ, but Vint's email
omitted the last two. I'm not sure if that was intentional. In any
case, we can reduce the 3 cases to:

* IDNA2003 (with pre-processing)
* IDNA2008 without pre-processing
* IDNA2008 with pre-processing

In the case of HTML implementations, I believe the 2nd is unlikely,
judging from comments made by Shawn and others. That leaves the other
two, which are compatible, so we don't have a two-way incompatibility.
Maybe I misunderstood your "three-way incompatibility".

>>> If we
>>> allow them as PVALID and let the registries include both
>>> formerly mapped and unmapped forms, at least I think we end
>>> up with something that can accommodate both usages except
>>> that the occurrence under IDNA2008 would be through direct
>>> use of both forms with punycoding of each.
>>
>> I believe Vaggelis has been explaining that the .gr registry
>> folks are not entirely happy with the DNAME half-solution. If
>> we make Final Sigma PVALID (and refrain from mapping it to
>> Normal Sigma), the .gr folks will have to add even more DNAMEs.
>
> I'd really like to see a solution to the problems this poses.  I
> don't know of one that is at all plausible.  Trying to correct
> things back to final sigma on display doesn't work without
> hyphens (or some equivalent -- someone could, of course,
> introduce ZWNJ into Greek) or prohibits them in the middle of a
> label.  Trying to do this with a metadata file doesn't work
> unless the file is per-domain and identifies exactly which
> characters are to be converted and even that doesn't solve the
> non-web question.

idndisp.txt would be per-domain, it would list the names allowed in
display, and it would work if the client was connected to the
Internet. In the absence of idndisp.txt, the client would display
letters without tonos. This is not such a big problem.

>> Vaggelis has said that a PVALID Final Sigma does have its
>> "advantages", and I believe one of them would be the ability to
>> display the Final Sigma to users. However, as I have
>> explained, you can get the display advantage via
>> http://<domain-name>.gr/idndisp.txt without leaving Final
>> Sigma unmapped.
>
> Could you be very specific about what you think would be in that
> file and how it would work.  I'm having trouble forming a
> picture.  If it just says "if you see a sigma at the end,
> display final sigma", then it doesn't cover the embedded cases
> that occur when words are catenated to form labels.

The file might just be a list of name/value pairs:

display: ελληνικός.gr
display: www.ελληνικός.gr

The client would go down the list, converting each name to A-labels,
until it finds one that matches the domain name used to fetch this
file. For each name, it would first try the new .gr mapping spec, then
the IDNA2003 spec.

Initially, the file might contain display names for both the new .gr
mapping and the old IDNA2003. After a while, the site owner might
remove the names that only match under IDNA2003, in order to reach the
desired final state, where all or most of the domain names in e.g.
HTML are in A-label format and no DNAMEs are necessary.

> And I
> still want to hear how this would work for other protocols,
> including protocols that don't exist yet,

No matter what protocol is being used when the domain name is first
encountered, the client would always use HTTP (or possibly HTTPS) to
access the idndisp.txt file. This is why I'm saying that an Internet
connection is needed (to get the owner's desired display).

> and how caching would
> be handled.  Does that file ever expire or get updated and, if
> so, what are the timeout conditions?

I believe we could come up with some recommendations for HTTP cache/expiry.

> And how would it work for
>   string-in-greek.somedomain.biz.
> which has nothing to do with the .GR TLD?

If .biz sees .gr's success and wants to emulate it but already has too
many customers tied to IDNA2003 (e.g. different owners of tonos-less
and tonos-ful names), .biz may decide not to switch. This is purely a
business decision. Yes, this means that clients need a table of TLDs
that use the .gr mapping. Not very difficult, but perhaps unpalatable
to some engineers.

> I'd like to see a way to make this work on a "do whatever people
> want" basis, but I don't see one short of a DNS redesign --
> either along the lines Andrew proposed, going back to some
> variation on my long-deal Class proposal, or deciding that this
> is one more issue pointing to the desirability of doing a
> complete DNS-version-2 design.  But, as Andrew, Jaap, and others
> pointed out, any of those solutions would involve a _very_ long
> deployment time.

The .gr folks do not have to wait for that.

>> If the .gr folks decide that IDNA2003 has failed under .gr,
>> they may also decide to experiment with the following:
>>
>> (1) MSIE plug-in for the URL bar (similar to the old IDNA2003
>> plug-ins) (2) Firefox extension or modification for the URL bar
>> (3) for the keyboard only, map letters with tonos to letters
>> without tonos (4) continue to map final sigma to normal sigma
>> (5) after mapping, convert to Punycode, prepend xn--, and
>> perform DNS lookup (6) make the MSIE/Firefox additions fetch
>> http://<name>.gr/idndisp.txt (7) before display, convert the
>> display form to A-labels to make sure they match the originals
>> (for security reasons)
>> (8) if the local experiments show good results, try to get
>> MSIE and Firefox to adopt the .gr mappings in keyboard-related
>> code (9) provide mapping tools to the community, for HTML
>> authoring, etc (10) encourage HTML authors to use the xn--
>> form, so that DNAMEs are unnecessary
>
>> One of the dangers of this approach is so-called balkanization
>> (or fragmentation) of the Internet, especially if many ccTLDs
>> and 2LDs start experimenting with and demanding their own
>> mappings.
>
> This is always a risk, to be weighed against many things
> including the observation that plug-ins have never worked in a
> completely predictable and satisfactory fashion.  I also note
> that there are language communities who are significantly
> offended by Unicode --and therefore the treatment of almost all
> characters-- rather than just final sigma plus the tonos
> problem.  They may not be rational, or correct in some abstract
> sense, but, if they can capture a registry and perhaps a
> regulatory authority, there are all sorts of "opportunities".
> If, directly or indirectly, we encourage going down a path such
> as what I think you intend by http://<name>.gr/idndisp.txt, then
> those communities might well use the mechanism to express their
> own ideas about how things should be rendered, perhaps even
> specifying character images rather than mapping rules or the
> equivalent.

The idea is to use a simple mapping that is still based on Unicode. If
the registry decides to invent a mapping that does not even use
Unicode, they may have a hard time getting e.g. MSIE and Firefox to
adopt their mapping.

> I also note that, unless the cross-checks get very sophisticated
> (and therefore time-consuming) that a display format file is the
> phisher's dream because it could be used to associate
> "your-favorite-bank" on display with "evil.com" in the DNS.

No, the matching is simple and strict.

> Finally, ignoring the non-web protocols for a moment, it is not
> clear to me what would be accomplished by having a separate file
> that would not be equally well accomplished by clever
> construction of an "<a>" element in HTML (or its equivalent in
> other arrangements).  That would bind display to the content
> specification, which is perhaps where it belongs, rather than
> the domain, and would avoid inventing _any_ new mechanisms.

I wouldn't want to increase the size of each HTML file just to get the
same effect that can be achieved using a single, small file at
https://<name>.tld/idndisp.txt.

>> However, the xn-- labels will continue to work in other parts
>> of the world, so there's no real fragmentation there, other
>> than the relatively minor display issue (since tonos-less
>> letters look similar to the same letters with tonos).
>
> Of course, that is an argument for generalizing a bit and
> carrying only A-labels in URLs (see subsequent note) and using
> supplemental information for display.

Yes, A-labels in URIs and idndisp.txt for desired display.

>> Internationalization and localization often start out as local
>> programs or modifications that eventually get adopted by
>> software in other parts of the world. For example, local
>> engineers shoe-horned bidi support into several programs, and
>> eventually e.g. MSIE and Firefox built their own bidi support.
>
>> It is important to refrain from performing the .gr mappings to
>> domain names found in hrefs in HTML. Otherwise, locally
>> authored HTML documents will not work in other parts of the
>> world (unless there are DNAMEs for those domain names, which
>> would defeat the goal of eventually getting rid of DNAMEs).
>
> If I correctly understand it, it is precisely one of the
> problems I'm concerned about.  You know why that is a bad idea,
> but a page author trying to do something interesting (even with
> good intentions) may want to use the mechanism to do something
> else entirely.

I'm sorry, but I don't really understand.

Erik


More information about the Idna-update mailing list