Mappings (was: Re: High-level changes from IDNA2003 in the "current work")

James Seng james at seng.sg
Sat Mar 8 01:42:01 CET 2008


Thanks for the explanation.

Here are my thoughts (which may jumped the gun a little).

1. The principle (requirements) for consistency and no-surprise is
important in domain names. If I enter a label on one machine using
some particular software, it should resolve the same way if I entered
the same label on another machine using some other software.

While the IME and OS probably have better understanding of the
font-set/charset it has, there should be no ambiguity in how they maps
their own compatible or local characters (which is discouraged but if
they do, they do), into the appropriate codepoint.

I yield this part may be local, but it should be no different from the
applications picking Arial or Times Roman to display ABC.com. ABC.com
remains ABC.com.

2. Broken implementations are always there. But since developers are
still in variable stage of implementations, we can still fixed it.
This is unlike the 8-bit clean DNS problem we encounter before, as
there are broken implementations that have being widely used for
several decades, which is why we are using punycode and not UTF-8.

3. On display of punycode, I agree that it should be fixed in 3490 but
not exactly in the way you are proposing.

3490 is written at the time when we have no IRI. Now that IRI exists,
I think these display & input issues should be handled by IRI whereas
IDN can focus on the interaction between IRI in
Applications->IDN->over-the-wire resolutions. This may be less
confusing to developers.

-James Seng

On Fri, Mar 7, 2008 at 10:52 PM, John C Klensin <klensin at jck.com> wrote:
>
>
>  --On Friday, 07 March, 2008 07:30 +0800 James Seng
>  <james at seng.sg> wrote:
>
>  > I am curious about (c)....wouldn't take give raise to
>  > inconsistency of results as implementations varies?
>
>  James,
>
>  Partially because the prohibition in 3490 on display of punycode
>  didn't work (some browser vendors now view display of punycode
>  in varying circumstances as a feature) and because various
>  implementations are already doing some local mapping in URIs and
>  IRIs (i.e., mapping that is not specified in IDNA2003), that
>  inconsistency already exists.   It is aggravated by a user-level
>  inconsistency for both end users and registrants: other than
>  computers and a few patient experts, no one actually understands
>  what characters are, and are not, accepted by IDNA and there is
>  continuing confusion about what "registration" means and whether
>  one can register a string that cannot come out of
>  ToUnicode(ToASCII(string)).  Different registries have different
>  policies on the latter.  A few registrars have concluded that
>  IDNs are so confusing and problem-prone that they don't want to
>  touch them.  Those comments are certainly anecdotal rather than
>  definitive and certainly do not represent checks all the way
>  down the DNS tree, but they are, I believe, symptomatic of a
>  broader problem.
>
>  In addition, glyphs in fonts seem to be available more often for
>  base characters than for compatibility ones.   While
>  applications can sometimes tell which scripts are supported
>  locally, particular characters within fonts are much harder.  In
>  general, the application can only assume that the character are
>  there and try to display them (typically resulting in question
>  marks or little boxes if the characters are not available, but
>  sometimes in mapping to other, similar, characters).  Or the
>  application much assume that display is not possible and display
>  the punycode-encoded string instead (see above).
>
>  The lack of full support for compatibility and other mapped-out
>  character should not be surprising, but display as little boxes
>  or question marks is the worst possible case, since information
>  is actually lost and copy-and-paste are even less likely to work
>  than usual.
>
>  We either need to do something to clean that up (or, of course,
>  we could decide we like the warts).  The approach taken in the
>  proposed documents is that we need to minimize variation in
>  domain names in interchange and --whether you look at it as part
>  of the goal or as an effect-- make the equivalents of the
>  ToASCII and ToUnicode operations reversible.  We believe that
>  will significantly reduce confusion and significantly improve
>  interoperability.
>
>  Once the mappings are removed from IDNA, there are several
>  possible approaches:
>
>         (i) Do mapping externally to the IDNA protocol set,
>         possibly using a standardized model that preserves
>         complete compatibility with IDNA2003.   This at least
>         gets us clarity about what can be registered and what
>         goes onto the model.  It also gives us the potential for
>         different rules for different protocol contexts, which
>         might be either an advantage or a disadvantage.
>
>         (ii) Do as little mapping as possible except in contexts
>         where backward-compatibility is more important than
>         cleaning things up.  Given the comments above and
>         earlier discussions on this list and in
>         draft-klensin-idnabis-issues, this might be the best
>         approach going forward, and probably would have been the
>         best approach if we were starting from a clean slate.
>         Whether it is wise or not  today depends on what we
>         think about the importance of IDNs that are in active
>         use now using characters that map out versus the much
>         larger number of IDNs that may exist in the future.
>
>         (iii) View mapping among Unicode characters as a
>         completely local matter, just as we have always viewed
>         mapping into Unicode from local character sets and
>         codings.  This requires strong "it better not leak if
>         you expect it to resolve" constraints (which we have
>         today in different form), but is consistent with the
>         knowledge that some local mappings are inevitable as
>         application implementations attempt to compensate for
>         perceived inadequacies, vis-a-vis their script or
>         writing systems, in either IDNA or Unicode.
>
>  There are probably some hybrid possibilities as well, but the
>  proposed documents are, deliberately, completely agnostic on
>  this subject (draft-klensin-idnabis-issues may not be quite
>  clear about that in the current version -- I've learned a bit
>  more about how to explain it in the few weeks since the most
>  recent version was posted).
>
>  best,
>    john
>
>  p.s. While I think Paul's summary is a useful way to get people
>  started on coming up to speed, I hope no one will consider it a
>  substitute for reading the documents, even between now and the
>  BOF.
>
>


More information about the Idna-update mailing list