Codepoints removed between IDNA200X and IDNA2003

Simon Josefsson simon at
Wed Mar 19 15:37:27 CET 2008

Patrik Fältström <patrik at> writes:

> On 19 mar 2008, at 12.24, Patrik Fältström wrote:
>> On 19 mar 2008, at 12.03, Simon Josefsson wrote:
>>> For the control characters, it looks to me like the UseSTD3ASCIIRules
>>> flag was not set when generating the list.
>> True. I will re-run with the flag set.
> This is the run with the flag set.
> cp = ToUnicode(ToASCII(cp, UseSTD3ASCIIRules), UseSTD3ASCIIRules)
>    Patrik

Dot (U+002E) is not allowed by IDNA2003-ToASCII either, so I believe
that line is incorrect.

Did you run the multiple-label ToASCII algorithm or the single-label
ToASCII algorithm?  The latter should not permit U+002E.

The multiple-label ToASCII algorithm is under-specified with regard to
zero length labels in RFC 3490.  If you follow the strict algorithm as
explained in RFC 3490, you end up with rejecting strings that end with
'.', including the string that only contains '.'.

After some discussion with Marcos Sanz back in 2003 in libidn we decided
to permit trailing zero length labels, based on the following remark in
the terminology section of RFC 3490:

   A label is an individual part of a domain name.  Labels are usually
   shown separated by dots; for example, the domain name
   "" is composed of three labels: "www", "example", and
   "com".  (The zero-length root label described in [STD13], which can
   be explicit as in "" or implicit as in
   "", is not considered a label in this specification.)
   IDNA extends the set of usable characters in labels that are text.
   For the rest of this document, the term "label" is shorthand for
   "text label", and "every label" means "every text label".

Thus, the string "." is considered to contain one zero-length label and
a delimiter.  I don't know how other implementations handle this.


More information about the Idna-update mailing list