<div class="gmail_quote">On Wed, Sep 2, 2009 at 1:41 AM, Andrew Sullivan <span dir="ltr"><<a href="mailto:ajs@shinkuro.com">ajs@shinkuro.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">On Tue, Sep 01, 2009 at 10:36:53AM -0400, Andrew Sullivan wrote:<br>
><br>
> Hrm, so actually it is possible that all ASCII characters in an<br>
> A-label are upper case, even though the input from the U-label is not<br>
> allowed to have upper case in them.<br>
<br>
</div>Wait, that's not quite right either, I think. If I understand you<br>
correctly,<br>
<br>
xn--Bcher-kva<br>
<br>
is not an A-label because Bücher is not a U-label. But<br>
<br>
xn--bcher-KVA<br>
and<br>
xn--bcher-kva<br>
<br>
both are U-labels, and are both the valid output of allowed Punycode<br>
implementations, resulting from encoding the valid U-label bücher? Is<br>
that right? I'm not sure I'm convinced.<br>
<br></blockquote><div><br></div><div>I did a simple experiment and modified the Python punycode implementation to use uppercase output characters. It did confirm what the RFC says and James' interpretation.</div><div>
<br></div><div>I uppercased the value of the "digits" variable on line 79 of encodings/punycode.py (source code here: <<a href="http://svn.python.org/view/python/tags/r251/Lib/encodings/punycode.py?revision=54864&view=markup">http://svn.python.org/view/python/tags/r251/Lib/encodings/punycode.py?revision=54864&view=markup</a>> )</div>
<div><br></div><div>Before the modification:</div><div><br></div><div><div>>>> from encodings import punycode</div><div>>>> s = u"bücher"</div><div>>>> s</div><div>u'b\xfccher'</div>
<div>>>> s.encode('utf-8')</div><div>'b\xc3\xbccher'</div><div>>>> punycode.punycode_encode(s)</div><div>'bcher-kva'</div><div><br></div><div><br></div><div>After the modification:</div>
<div><br></div><div>>>> reload(punycode)</div><div><module 'encodings.punycode' from '/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/encodings/punycode.py'></div><div>
>>> punycode.punycode_encode(s)</div><div>'bcher-KVA'</div><div><br></div><div>So, this confirms that punycode (at least this implementation, and the way I understood and modified the code) will not output uppercase to represent encoded non-ASCII codepoints even if you told it to use uppercase as output characters. This is consistent with what the RFC says.</div>
<div><br></div><div>That said, I don't think it's a problem since most implementations probably used lowercase because that's how the examples are shown. If memory serves me right, all of the IDNA2003 libraries that I've used: ICU4C, ICU4J, libidn, python, idnkit, output in lowercase. I doubt there is a single library or application out there that would behave otherwise.</div>
<div><br></div><div>Nevertheless, the possibility, however minute, does exist. Does WG members think a clarification is needed in idnabis-protocol (or somewhere else) to point this out?</div><div><br></div><div>=wil</div>
</div></div>