The Future of IDNA

Erik van der Poel erikv at google.com
Thu Mar 19 19:12:39 CET 2009


Hello All,

We are at a critical point in time between the history and future of
IDNA. I believe we have allowed ourselves to be distracted by a couple
of topics, as follows.

DNS extensions. A number of ideas have been discussed about DNS
responses that would allow IDN display preferences and IDNA version
transition.

Keyboard UI vs high-level protocol mapping. We have talked about
compatibility issues in processing URLs-in-HTML, and some have
suggested pushing the mapping all the way out to the UI (keyboard).

Both of these have distracted us from the critical issue, which is
that IF the client does NOT map certain character sequences to others,
THEN the server must bundle a large number of names that differ only
in the way that these character sequences are presented.

Notice that it does not matter whether those mappings are performed
immediately after keyboard input or a long time after that, e.g. in
HTML processing. The point is that IF the client does not map Final
Sigma to Normal Small Sigma and Characters with Tonos to Characters
without Tonos, THEN the server must bundle all of the permutations
(final/normal and with/without tonos).

So I believe IDNAbis must return to a model similar to IDNA2003's,
where mapping is a MUST.

In particular, certain characters must be disallowed in lookup and
registration (e.g. Final Sigma and Characters with Tonos), while
others must be allowed in lookup and registration (e.g. Eszett, ZWJ
and ZWNJ).

A different set of characters must be allowed in DISPLAY. I believe
that the current IDNA2008 table of characters is perfect or almost
perfect for display.

Note that some language communities may wish to strip accents in
lookup/registration (e.g. tonos in Greek script), while other language
communities may agree to leave accents on the letters (e.g. Latin
script). Yet other communities may agree to add mappings for similar
characters (e.g. East Asian Han characters that are currently bundled
or blocked on the server side). It is possible that some small
communities may not achieve their desired inclusions and exclusions,
but that is inevitable. Interoperability is more important.

So we must decide to have two sets of characters, one for lookup and
registration, and another for display. Once we have decided that, we
can talk about transition (probably via multiple lookup and bundling)
and display (via mechanisms that have already been discussed
somewhat).

Erik


More information about the Idna-update mailing list