I don&#39;t think the original Punycode mechanism would work, since I think it would be an incompatible change in the result compared to strings encoded under IDNA2003 (especially since, it only allows for 1 bit per character, as you say).<br>

<br>This is all blue-skying at this point, but the more I think about it, the more a bit-vector approach looks promising in order to handle the handful number of peculiar cases (eszett, sigma, i) in a compatible way.<br><br>

Mark<br><br><div class="gmail_quote">On Mon, Mar 31, 2008 at 2:09 AM, Markus Scherer &lt;<a href="mailto:mscherer@google.com">mscherer@google.com</a>&gt; wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="Ih2E3d">On Sat, Mar 29, 2008 at 7:49 PM, Mark Davis &lt;<a href="mailto:mark.davis@icu-project.org">mark.davis@icu-project.org</a>&gt; wrote:<br>

&gt; The simplest mechanism would be to then take that set of bits and walk<br>

&gt; through the Punycode, and for each bit in the vector changing each cased<br>

&gt; letter to uppercase to represent a 1 bit, and leaving it lowercase represent<br>

&gt; a 0 bit.<br>

<br>

</div>I recommend against inventing a new mechanism here. Punycode already<br>

provides an &quot;originally-uppercase&quot; bit per source character. Within<br>

IDNA, the uppercase information could be extracted before or during<br>

folding, and then passed into the Punycode-encoding function.<br>

<br>

Unfortunately, there is only one bit per character, which as you point<br>

out is insufficient in some cases for precise representation of the<br>

original character. I am not sure if there is room to reliably extend<br>

the mechanism to 2 bits per character while maintaining compabibility<br>

and not confusing existing implementations that use the predefined<br>

mechanism.<br>

<br>

markus<br>

<font color="#888888">--<br>

Google Internationalization<br>

</font></blockquote></div><br><br clear="all"><br>-- <br>Mark