Stop me if I've misunderstood...
Kenneth Whistler
kenw at sybase.com
Fri Jul 10 01:35:35 CEST 2009
> On Thu, Jul 09, 2009 at 03:26:43PM -0700, Paul Hoffman wrote:
> > At 5:27 PM -0400 7/9/09, Andrew Sullivan wrote:
> > >I'm not sure I agree with the above characterization. In DNS, case
> > >differences are preserved but not significant for matching.
> >
> > So, take it to an example where not all characters are ASCII:
> > Ãxample.com and éxample.com seems like a good example.
>
> Well, that case also seems to me to be something the zone operator
> might do reasonably. There's no risk of the "xample" part being
> changed, so there are just two variants to register -- not such a big
> deal.
Now scale it from an artificial éxample with one non-ASCII
letter, and consider Vietnamese, where nearly every syllable
has an accented-non-ASCII vowel, and where there are also
common non-ASCII consonants. Suddenly you've imposed a
combinatorial explosion burden on the Vietnamese zone operator.
This is a similarly difficult problem for many languages
written with the Latin script -- most except English, in fact.
Or consider that the case-mapping issue impacts *every* character
in a Cyrillic or Greek IDN.
> But note that this gets us into the other sticky issue: what about
> accent-free versions of "the same name" (in this case,
> "example.com"). Should that also be bundled together.
That would be a bona fide example of a potential local
bundling issue for a zone operator.
But casemapping is not at all the same kind of thing. It
is an obligatory *mapping* that will have to be applied
somewhere in this process, or IDNs are utter chaos.
> I think the right answer to that question is, "It depends, and zone
> operators need to come up with policies around this." But not
> everyone is, plainly, happy with that answer.
You bet. I think it is a total nonstarter to think that
casemapping (or more correctly, casefolding) is an issue
to be left up to zone operator policy.
I thought we were coming to consensus a couple months
ago that casemapping and width mapping (for fullwidth
and halfwidth forms in East Asian character sets) should
properly be considered a part of the protocol, but that
other mappings for backwards compatibility with IDNA 2003
were more marginal and could reasonably be considered as
part of a preprocessing mapping recommendation document.
But it appears that the group has now made a hard right
turn somewhere, and is trying to throw out the baby
with the bathwater, citing a line in the charter as
a justification for a technical decision about best
protocol design, and settling on a position that
mapping should not be in the protocol at all.
Instead, these mappings are abstracted out into mappings-01.txt,
and then instead of any obligatory and well-defined
requirement for interoperability, the whole things is
further qualified with oughts and maybes and applications
may do their own local things.
I consider this a major mess at this point. Sorry.
--Ken
>
> A
More information about the Idna-update
mailing list