Punycode & IMA/EAI
fsasaki at w3.org
Thu May 22 14:46:43 CEST 2008
John C Klensin さんは書きました:
> --On Wednesday, 21 May, 2008 13:06 -0700 Shawn Steele
> <Shawn.Steele at microsoft.com> wrote:
>> b) if there is such a requirement to lookup unknown punycode,
>> then there must be a provision to also allow converting
>> Unicode code points to punycode. This isn't a problem if the
>> Unicode is "normalized". It could be a problem if the Unicode
>> is in a case mapped form or other form that might not make the
>> conversion straightforward.
> I'm still not sure I understand what you are getting at here,
> unless it is a return to the discussion of just storing UTF-8 in
> the DNS rather than using punycode encoding or some other ACE.
> But, to the issue above, this is one of the key reasons why the
> design of IDNA2008 advocates moving to as little mapping as
> possible. We are going to need to be sensitive to transition
> arrangements, but, for the reasons you cite (again, if I
> understand them) having a fully reversible U-label <-> A-label
> mapping is actually fairly important to users...
A while ago the W3C i18n core Working Group was approached with a use
case where full a fully reversible U-label <-> A-label
mapping seems to be important, see Powder" "IRI/URI Canonicalization"
a Powder WG participant says
Our basic need is that we must be able to be certain whether a given IRI
does or does not match a small data set. Typically, something like
For a given IRI, we need to be 100% sure whether it does or does not
match these conditions - i.e. that it has a host with the last two
components of exåmple.org and a path that contains foó.
Given that an IRI may, or may not, have been re-encoded in one of
several different ways, how can we canonicalise it before matching?
I tried to argue that this is an IRI problem and not an IDN problem, but
there is argumentation that the IDN part needs a 100% reversible mapping
in Powder as well, see
Am I right in the assumption that the *if* there is need for Powder to
have 100% reversibility of mapping U-label <> A-label, this could be
guaranteed with IDNbis as currently planned, but not with IDN2003?
> conversely, one of the issues with IDNA2003 is precisely "user
> puts a string in, it gets converted to punycode form, it gets
> converted back, user doesn't recognize result".
> There are obvious and difficult tradeoffs here and IDNA2003
> wasn't "wrong". It just appears that making the tradeoff the
> other way and moving in that direction as quickly as possible
> is, on balance and with experience, a better choice.
More information about the Idna-update