local mappings

John C Klensin klensin at jck.com
Tue Jan 27 02:26:31 CET 2009



--On Monday, January 26, 2009 6:38 AM -0800 Erik van der Poel
<erikv at google.com> wrote:

> On Mon, Jan 26, 2009 at 6:03 AM, John C Klensin
> <klensin at jck.com> wrote:
>> --On Monday, January 26, 2009 8:50 AM -0500 Andrew Sullivan
>> <ajs at shinkuro.com> wrote:
>>> On Fri, Jan 23, 2009 at 05:36:42PM -0500, John C Klensin
>>> wrote:
>>>> the unsophisticated)?    If we do lower-case, but continue
>>>> to ban compatibility characters and the other odd cases that
>>>> surprise those who don't know what is going on, does that
>>>> help us significantly with the compatibility and
>>>> astonishment situations that are really important?
> 
> Astonishment is in the eye of the beholder. For example,
> English speakers may be astonished to discover even the
> existence of the full-width Latin letters in Unicode, while
> Japanese speakers may be astonished if full-width Latin
> letters are not automatically converted to normal-width ones
> (i.e. NFKC).

Erik, 

With the understanding that this is a question --I'm not sure
that even the lower-case mapping is a good idea-- rather than
trying to persuade you of anything...

Reasonable people may disagree, but we've gotten very clear
signals that all of the mappings of IDNA2003 were bad news.
They create, rather than reduce confusion, especially since
information is lost in the reverse mappings.  That input was
reflected in both RFC 4690 and the charter although, again,
reasonable people might disagree with either one.

If we prohibit mappings in the protocol completely, we have to
assume that some people/ implementations will do some "obvious"
mappings anyway.  Regardless of how it is stated, that is what
gets us to the "local mapping" text that several people have
(reasonably, IMO) found to be a matter of concern from the
standpoint of predictability of behavior.

I'm trying to find a way to get rid of, or severely limit, the
"local mapping" text.  One possible way to do it is to find a
middle ground on what gets mapped -- one that we can explain in
a clear way to people who are not experts on Unicode (or scripts
in general) -- while minimizing the total to preserve a response
to those who told us that non-reversibility was a big problem
and that they couldn't figure out what was valid in an IDN and
what was not... a problem that  is complicated by the
observation that "valid in an IDN" actually means two separate
things: strings that can be successfully processed into an
A-label and strings that can be obtained by decoding A-labels
(thanks to Patrik for identifying that distinction in this
latest round).

For case-mapping, I know how to define the rule and I know how
to explain it.  The stopping rule is also clear: conversion to
lower case is straightforward, even people who don't deal with
computers understand it, and neither Unicode nor IDNA2003
confuse case operations with compatibility encodings.   And
virtually anyone who has looked at or tried to use the
case-containing scripts (again, with or without computers) has a
basic understanding of the issue.   By contrast, these peculiar
"compatibility" relationships -- the characters that are
different codings for the same thing except when they aren't --
seem like a different kettle of fish... differences that exists
because of design decisions made in Unicode or its predecessors,
rather than differences that are inherent to the writing system.

Or maybe that distinction doesn't hold up, in which case we
either need to tell the communities who complained about what is
and is not valid in a domain name --and about ambiguities in
"valid"-- to just get used to it or we are back to trying to
define "local mapping" in a way that at least most of us can
live with.

If the stopping rule isn't "lower case and lower case only", you
tell me where it is and how we explain it to someone who doesn't
want to know about Unicode.

> Before we go to IETF Last Call, it seems to me that we should
> at least achieve rough consensus within the WG whether global
> mappings (lower-casing, NFKC, possibly others) will be part of
> IDNA200X or not.

Sure.  But keep in mind that we reached a consensus about that
as part of the charter process and it was "no mappings".  I
think it is worth exploring whether a very narrow set of
exceptions would provide high leverage.  But reversibility is
pretty important, at least in some people's minds, and one
cannot have both extensive global mapping and even a semblance
of reversibility...

     john




More information about the Idna-update mailing list