NFKC and dots

Shawn Steele Shawn.Steele at microsoft.com
Mon Jan 7 19:49:51 CET 2008


I haven't been paying much attention to this alias, so I may have missed some background information.

Erik said:
> Speaking of U+2024 and where in the protocol stack to handle things, I
> just discovered that MSIE 7 and Firefox 2 both perform NFKC on this
> character, to yield U+002E (.). After that, they divide the host name
> into labels *again*, so the new U+002E becomes a new label separator.

Actually that's not quite right.  The Windows APIs, IE7 & .Net use a special "normalization" to do the NFKC, stringprep, etc. all at once on the entire name.  If the ASCII function was called, we do the ACE encoding and check the labels at that time.

Since I haven't been following the whole discussion, its unclear to me what expected from U+2024, clearly U+002E can't be part of a label so our behavior (& Firefox's I guess) seems sensible.

Since our API returns a full string, the caller still has to figure out what the labels are if they care.  In most cases I think we just assume that the whole string'll be passed to the DNS system to figure out.

- Shawn




More information about the Idna-update mailing list