IDNA decode?

Simon Josefsson simon at josefsson.org
Mon May 16 14:00:59 CEST 2011


Yoshiro YONEYA <yoshiro.yoneya at jprs.co.jp> writes:

> On Mon, 16 May 2011 09:20:27 +0200 Simon Josefsson <simon at josefsson.org> wrote:
>
>> I had a feature request [1] regarding converting from IDN form to
>> Unicode form.  I couldn't find a description how this is done in the
>> IDNA2008 document set, but I must be missing it.  Could anyone point me
>> in the right direction?
>
> Section 5.3 "A-label Input" of RFC5891 describing how to convert A-label 
> into U-label.

I read that as being part of the Domain Name Lookup protocol?
Converting from IDN form to Unicode form is a display operation, and
does not necessarily have anything to do with lookup.

Also, the section only covers labels, not entire domains.

For reference the section is:

http://tools.ietf.org/html/rfc5891#section-5.3

It is not clear from that section, but what could be done is something
like this:

 - Check that the domain name and labels follows RFC 1034 (with updates)
   and split it into labels, make a note whether it ends with a '.' or not.
 - For each label L do
   - If the label does not begin with 'xn--' do nothing
   - If the label begins with 'xn--' then do
     - convert label to lowercase
     - XXX are tests in section 5.4 and 5.5 needed for display?
       the last section of 5.3 suggests it is a SHOULD
     - remove the 'xn--' part
     - perform punycode decode on the remaining part
       XXX what should be done if this step fails?
     - replace the original label with the decoded label
  - assemble the domain name from the labels, adding the final '.' if
    present in the input.

However maybe some other part of the documents already explains this
process.  Or is display of IDN's intentionally left outside the scope of
IDNA2008?

/Simon


More information about the Idna-update mailing list