Comments on draft-ietf-idnabis-defs-10

Tue Sep 1 20:20:30 CEST 2009

On Wed, Sep 2, 2009 at 1:41 AM, Andrew Sullivan <ajs at shinkuro.com> wrote:

> On Tue, Sep 01, 2009 at 10:36:53AM -0400, Andrew Sullivan wrote:
> >
> > Hrm, so actually it is possible that all ASCII characters in an
> > A-label are upper case, even though the input from the U-label is not
> > allowed to have upper case in them.
>
> Wait, that's not quite right either, I think.  If I understand you
> correctly,
>
>    xn--Bcher-kva
>
> is not an A-label because Bücher is not a U-label.  But
>
>    xn--bcher-KVA
> and
>    xn--bcher-kva
>
> both are U-labels, and are both the valid output of allowed Punycode
> implementations, resulting from encoding the valid U-label bücher?  Is
> that right?  I'm not sure I'm convinced.
>
>
I did a simple experiment and modified the Python punycode implementation to
use uppercase output characters. It did confirm what the RFC says and James'
interpretation.

I uppercased the value of the "digits" variable on line 79 of
encodings/punycode.py (source code here: <
http://svn.python.org/view/python/tags/r251/Lib/encodings/punycode.py?revision=54864&view=markup>
)

Before the modification:

>>> from encodings import punycode
>>> s = u"bücher"
>>> s
u'b\xfccher'
>>> s.encode('utf-8')
'b\xc3\xbccher'
>>> punycode.punycode_encode(s)
'bcher-kva'

After the modification:

>>> reload(punycode)
<module 'encodings.punycode' from
'/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/encodings/punycode.py'>
>>> punycode.punycode_encode(s)
'bcher-KVA'

So, this confirms that punycode (at least this implementation, and the way I
understood and modified the code) will not output uppercase to represent
encoded non-ASCII codepoints even if you told it to use uppercase as output
characters. This is consistent with what the RFC says.

That said, I don't think it's a problem since most implementations probably
used lowercase because that's how the examples are shown. If memory serves
me right, all of the IDNA2003 libraries that I've used: ICU4C, ICU4J,
libidn, python, idnkit, output in lowercase. I doubt there is a single
library or application out there that would behave otherwise.

Nevertheless, the possibility, however minute, does exist. Does WG members
think a clarification is needed in idnabis-protocol (or somewhere else) to
point this out?

=wil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090902/565ab361/attachment.htm