non-ASCII dots

Mark Davis mark at macchiato.com
Mon Mar 23 17:38:06 CET 2009


First off, I have *not* been pushing for allowing UNASSIGNED on lookup in
IDNA2008. This is for two reasons:

   1. We have had many Unicode versions since 3.2, so the urgency is not as
   prominent
   2. Because IDNA2008 updates more regularly, there is less need.


What I *have* been saying is that allowing UNASSIGNED on lookup wouldn't
make a difference, and that's the case even if a character maps to ".".

Let's take a specific example: àbc͸dèf.com <http://xn--df-7ia.com>, where
the middle character, \u0378, is currently unassigned as far as the client
is concerned (because it is back-reved), while the registry is on Unicode
6.0. The XN form is xn--bcdf-zna5c481a.com.

Here's what happens when the client software (browser, emailer, etc) looks
the domain name up, depending on what \u0378 turns into under 6.0.

   1. \u0378 becomes DISALLOWED. No problem. No conformant registry can
   support it, even on Unicode 6.0; the lookup is denied.
   2. \u0378 becomes PVALID. No problem - the lookup works.
   3. \u0378 becomes mapped to X (assuming we allow mapping on lookup)
      1. X is DISALLOWED, say "$".  No problem. No conformant registry can
      support it, even on Unicode 6.0; the lookup is denied.
      2. X is PVALID, say "X". The lookup fails. The remapped domain name
      would work as xn--bcxdf-qqa4d.com, but the original URL would not work
      until the client is updated, or unless the user learns to type X instead
      until s/he updates his/er client.
      3. X is ".". The lookup fails. The remapped domain name would work as
      xn--bc-iia.xn--df-7ia.com, but the original URL would not work until
      the client is updated, or unless the user learns to type X instead until
      s/he updates his/er client.

Whether the character maps to a dot or not in Unicode 6.0 doesn't make any
difference in the scenario. It just fails the lookup in a different way (3.3
instead of 3.2), but the lookup fails in either case.

Mark

On Sun, Mar 22, 2009 at 17:00, Erik van der Poel <erikv at google.com> wrote:

> Hi again James, thank you for the email. I am quite aware of the dot
> issues in IDNA. I have first-hand experience with Japanese input
> methods and their modes, and I understand the motivation for the
> addition of non-ASCII dot processing in IDNA2003.
>
> The issue with U+2CFE COPTIC FULL STOP is a bit subtle, so let me
> explain. U+2CFE was added in Unicode 4.1. This means that, from the
> point of view of an IDNA2003 implementation, it is simply an
> unassigned character. Let's say we have a domain name like:
>
> aaa <U+2CFE> bbb . com
>
> Suppose that aaa and bbb are Coptic characters, and the typist
> happened to have a Coptic input method (though I have no idea whether
> such things exist!). Further, let's suppose that the client is using
> IDNA2003 with the flag "allow unassigned" set to true. If aaa and bbb
> are already lower-case, the client will do the right thing with them
> (leaving them as is). However, the client will not know that U+2CFE is
> a new dot-like character, so it will treat the entire sequence
> "aaa<U+2CFE>bbb" as a single label. It will then encode it in Punycode
> (including the dot-like character), and try to resolve that in DNS.
>
> Of course, this will not work because the intention was to resolve
> aaa.bbb.com, not aaa<U+2CFE>bbb.com. In other words, a new client and
> an old client would resolve this name differently.
>
> I don't know how many IDNA2003 clients actually set the "allow
> unassigned" flag to true. It is obviously very dangerous, since the
> client cannot possibly know how to case-fold the new characters,
> including Coptic.
>
> (And this is also why Mark is wrong when he says that if clients are
> allowed to lookup XN-labels with unassigned characters, then they
> should also be allowed to lookup Unicode labels with unassigned
> characters.)
>
> Erik
>
> On Sun, Mar 22, 2009 at 2:33 PM, James Seng <james at seng.sg> wrote:
> > I think you misunderstood about the "dot" problem. It is not these
> > "dots" are allowed as domain name but they are identified as
> > "separator" like "."
> >
> > The main reason is to because when a user switch to CJK inputs, when
> > he press ".", most IME will spur out U+3002 instead. If you do not
> > identify U+3002 as a separator, then a user will have to enter CJK
> > IME, switch back to English, enter a ".", switch back to CJK IME etc.
> >
> > See http://tools.ietf.org/html/draft-jet-idnabis-cjk-localmapping-00
> >
> > -James Seng
> >
> > On Mon, Mar 23, 2009 at 1:51 AM, Erik van der Poel <erikv at google.com>
> wrote:
> >> Another question from the summary:
> >>
> >>> A. Multiple characters are allowed as "dots" in domain names under
> >>> IDNA2003 and presumably under IDNAV2. This is a general problem for
> >>> all versions of IDNA but may be exacerbated by the variants for "dots"
> >>> that are permitted under IDNA2003 and IDNAv2. What is the WG view?
> >>
> >> In my view, non-ASCII dots should never have been allowed in IDNA2003.
> >> However, now that many IDNA2003 implementations have been distributed
> >> to users and a few stored domain names use these non-ASCII dots, some
> >> may feel that we have to support them (forever).
> >>
> >> Having said that, I am quite concerned about adding yet another
> >> non-ASCII dot in IDNAv2 (U+2CFE COPTIC FULL STOP) because IDNA2003
> >> includes a flag that allows for the lookup of unassigned (in Unicode
> >> 3.2) characters. Such applications would not only fail to case-fold
> >> post-Unicode-3.2 characters correctly, they would fail to divide the
> >> full domain name into individual labels, and since DNS labels are
> >> "owned" by different owners, this just seems like an invitation to
> >> further problems.
> >>
> >> In my view, the dot is a keyboard and UI issue. Of course, it would be
> >> nice if we could push ALL mappings out to the keyboard and UI, but, to
> >> use one of John's favorite words, this may be "unrealistic". ;-)
> >>
> >> Erik
> >> _______________________________________________
> >> Idna-update mailing list
> >> Idna-update at alvestrand.no
> >> http://www.alvestrand.no/mailman/listinfo/idna-update
> >>
> >
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090323/ea4d9469/attachment.htm 


More information about the Idna-update mailing list