Punycode & IMA/EAI

Felix Sasaki fsasaki at w3.org
Thu May 22 14:46:43 CEST 2008


John C Klensin さんは書きました:
> --On Wednesday, 21 May, 2008 13:06 -0700 Shawn Steele
> <Shawn.Steele at microsoft.com> wrote:
>
>   
>> ...
>> b) if there is such a requirement to lookup unknown punycode,
>> then there must be a provision to also allow converting
>> Unicode code points to punycode.  This isn't a problem if the
>> Unicode is "normalized".  It could be a problem if the Unicode
>> is in a case mapped form or other form that might not make the
>> conversion straightforward.
>>     
>
> Shawn,
>
> I'm still not sure I understand what you are getting at here,
> unless it is a return to the discussion of just storing UTF-8 in
> the DNS rather than using punycode encoding or some other ACE.
> But, to the issue above, this is one of the key reasons why the
> design of IDNA2008 advocates moving to as little mapping as
> possible.   We are going to need to be sensitive to transition
> arrangements, but, for the reasons you cite (again, if I
> understand them) having a fully reversible U-label <-> A-label
> mapping is actually fairly important to users... 

A while ago the W3C i18n core Working Group was approached with a use 
case where full a fully reversible U-label <-> A-label
mapping seems to be important, see Powder" "IRI/URI Canonicalization"
http://www.w3.org/2007/powder/Group/powder-grouping/20080128.html#canon

at
http://lists.w3.org/Archives/Public/public-i18n-core/2008AprJun/0019.html
a Powder WG participant says
[
Our basic need is that we must be able to be certain whether a given IRI
does or does not match a small data set. Typically, something like
<iriset>
<includehosts>exåmple.org</includehosts>
<pathcontains>foó</pathcontains>
</iriset>
For a given IRI, we need to be 100% sure whether it does or does not
match these conditions - i.e. that it has a host with the last two
components of exåmple.org and a path that contains foó.
Given that an IRI may, or may not, have been re-encoded in one of
several different ways, how can we canonicalise it before matching?
]

I tried to argue that this is an IRI problem and not an IDN problem, but 
there is argumentation that the IDN part needs a 100% reversible mapping 
in Powder as well, see
http://lists.w3.org/Archives/Public/public-i18n-core/2008JanMar/0026.html

Am I right in the assumption that the *if* there is need for Powder to 
have 100% reversibility of mapping U-label <> A-label, this could be 
guaranteed with IDNbis as currently planned, but not with IDN2003?

Felix

> and,
> conversely, one of the issues with IDNA2003 is precisely "user
> puts a string in, it gets converted to punycode form, it gets
> converted back, user doesn't recognize result".
>
> There are obvious and difficult tradeoffs here and IDNA2003
> wasn't "wrong".  It just appears that making the tradeoff the
> other way and moving in that direction as quickly as possible
> is, on balance and with experience, a better choice.
>
>      john
>
>   



More information about the Idna-update mailing list